Apache SystemML Roadmap

Planned for Future SystemML 1.2

  • Algorithms & builtin functions
    • NN layers-based factorization machines with regression & classification capabilities
    • NN optimization test suite with well known optimization test functions
    • Model selection & hyper parameter tuning
    • Additional distribution functions, e.g. weibull, gamma
    • Generalization of operations, such as xor, and other operations
  • Enhanced Deep Learning support
    • Coherent sparse operations on CPU/GPU
    • Coherent single-precision support on CPU/GPU
    • Distributed DL operations
  • GPU Support
    • Full compiler integration (cost-based, automatic placement)
    • Multi GPUs
    • Distributed GPUs
  • Code generation
    • Deep learning operations
    • Heterogeneous HW, incl GPUs
  • Compressed Linear Algebra
    • Deep learning operations
    • Ultra-sparse datasets
  • Misc Runtime
    • NUMA-awareness (thread pools, matrix partitioning)
    • Unified memory management (ops, bufferpool, RDDs/broadcasts)
    • Support additional external formats such as feather format for matrices and frames
    • Parfor support for broadcasts
  • Misc Compiler
    • Consolidate replicated compilation chain (e.g., diff APIs)
    • Holistic sum-product optimization and operator fusion
    • Extended sparsity estimators
    • Rewrites and compiler improvements for mini-batching including prefetching
    • Parfor optimizer support for shared reads
    • SPOOF compiler improvement
  • APIs
    • Python Binding for JMLC API

Feature Candidates for Future Releases

  • Completion of Prior Experimental Features
  • Python DSL
  • New Algorithms -- e.g. Decomposition Algorithms
  • Common DSL Architecture
  • R Interfaces: R DSL and R Wrappers
  • Native Zeppelin Notebook Support
  • Sum Product Optimizations
  • Tree-based Data Structures
  • Global Dataflow Optimizations
  • Tooling

Current Release

  • SystemML 1.1.0 (released in March, 2018) details
    • New Builtin Functions: `ifelse`, `assert`, `eval`, `avg_pool`, `avg_pool_backward`
    • Additional Layers in NN library: average pooling, upsampling, low-rank fully connected
    • New Capabilities/Features such as dense matrix blocks >16GB, additional ParFor result aggregation operations, UDFs callable in expressions, zero rows/columns matrices, matrix-matrix multiplication over compressed matrices
    • Extended Caffe2DML and Keras2DML APIs
    • Compiler & Runtime enhancements
    • Performance improvements

Prior Releases

  • SystemML 1.0.0 (released in December, 2017) details
    • Enhanced Deep Learning support with enhanced NN layers and functions, Caffe2DML, and operator implementation
    • Native BLAS support
    • Additional algorithms: autoencoder, enhanced PCA
    • Enhanced rewrites, IPA, vectorization, and instruction generation
    • Enhanced JMLC API, e.g. prepared scripts with thread affinity for outputs and configs, script cloning, configuration management
    • SystemML Lite artifact
    • Compression on by default
  • Experimental Features
    • Keras2DML.
    • Enhanced code generation, code gen optimizer, and multi-threaded codegen operators/li>
    • Enhanced GPU support
  • Removals
    • Dropped JDK 7 support
  • SystemML 0.15.0 (released in September, 2017) details
    • Added several new 2D convolution layers
    • Graduated `nn` library from staging to `scripts/nn`
  • Experimental Features
    • Expanded Code Generation for broader performance improvements
    • Enhanced GPU support and scalability
  • Removals
    • Removed file-based transform
    • Removed original MLContext API
  • SystemML 0.14.0-incubating (released in May, 2017) details
    • Runtime feature extensions (new libsvm-binary data converters, parfor spark buffer pool handling, parfor block partitioning of fixed size batches of rows or columns, native dataset support in parfor spark datapartition-execute)
    • Compiler feature extensions (improved parfor execution type selection, improved literal replacement for nrow/ncol, simplified instruction generation across back-ends, consolidated static/dynamic rewrite utilities)
  • Experimental Features
    • New Code Generation capabilities for automatic operator fusion (basic code generator, compiler integration, runtime integration, in-memory source code compilation, extended explain tool, support for right indexing and replace in cellwise and row aggregate templates, support for row, column, or no aggregation in rowwise template). Note code generation provides significant performance gains with fewer read/write intermediates, reduced scans of inputs and intermediates, and enhanced sparsity exploitation. To enable this feature, set codegen.enabled property to true in SystemML-config.xml file.
    • New instructions and operators for GPU support (relu_maxpooling, conv2d_bias_add, bias_multiply)
  • Removals
    • Removed support for Java 6 and Java 7
    • Removed parfor perftesttool and cost estimator
  • SystemML 0.13.0-incubating (released in March, 2017) details
    • Updated build for Spark 2.1.0
    • New simplification rewrites for stratstats
    • New fused operator tack+* in CP and Spark
    • New dmlFromResource capability in Python (equivalent to Scala)
    • Add input float support to MLContext
  • Documentation Enhancements
    • Deploy versioned documentation to main project website
    • Add python mlcontext example to engine dev guide
    • Add MLContext info functionality to docs
    • Update DML Language Reference for write description parameter
  • Deprecations, Removals
    • Deprecate old MLContext API
    • Deprecate parfor perftesttool
    • Deprecate SQLContext methods
    • Replace deprecated Accumulator with AccumulatorV2
    • Replace append with cbind for matrices
    • Migrate Vector and LabeledPoint classes from mllib to ml
  • Experimental Features / Algorithms
    • Compressed Linear Algebra v2 (new DDC encoding format, hardened sample-based estimators, debugging tools, new column grouping algorithm, additional operations)
  • SystemML 0.12.0-incubating (released in February, 2017) details
    • Support pip install of new python package
    • Allow NumPy arrays, Pandas DataFrame and SciPy matrices as input to MLContext
    • Improve SystemML Python DSL for NumPy
    • Updated build for Spark 1.6.0
    • DML utility script to shuffle input dataset
  • Experimental Features / Algorithms
    • GPU Enhancements
  • SystemML 0.11.0-incubating (released in November, 2016) details
    • SystemML frames
    • New MLContext API
    • Transform functions based on SystemML frames
  • Experimental Features / Algorithms
    • New built-in functions for deep learning (convolution and pooling)
    • Deep learning library (DML bodied functions)
    • Python DSL Integration
    • GPU Support
    • Compressed Linear Algebra
  • New Algorithms
    • Lasso
    • kNN
    • Lanczos
    • PPCA
  • Deep Learning Algorithms
    • CNN (Lenet)
    • RBM
  • SystemML 0.10.0-incubating (released in June, 2016) details
    • Different types of Spark Matrix Blocks: MCSR, CSR, COO
    • SystemML Frame support in JMLC/CP
    • Initial Deep Learning support
    • API/Scripts: parser error handling, SystemML configuration handling,
  • Include Algorithms in SystemML jar, print matrix
    • New fused operator: wdivmm with variations
    • Performance Features: cache-conscious operations, more multithreaded
  • Operations, New Simplification Rewrites
    • New Algorithms: kNN
    • Documentation: javadocs, Jupyter/Zeppelin notebook examples
  • SystemML 0.9.0-incubating (released in January, 2016) details
    • Improvements to MLContext and MLPipeline wrappers
    • New converter utilities for RDDs and DataFrames
    • New Optimizations for Spark Backend, e.g. eager RDD caching and
  • Repartitioning, RDD Checkpointing, On-Demand Creation of SparkContext
    • New Runtime Operators for mmult, multihreaded readers and operators.
    • New Algorithms: ALS, Cubic Splines
    • Online Documentation