org.apache.sysml.runtime.instructions.spark.utils

Class RDDAggregateUtils

  • java.lang.Object
    • org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils


  • public class RDDAggregateUtils
    extends Object
    Collection of utility methods for aggregating binary block rdds. As a general policy always call stable algorithms which maintain corrections over blocks per key. The performance overhead over a simple reducebykey is roughly 7-10% and with that acceptable.
    • Constructor Detail

      • RDDAggregateUtils

        public RDDAggregateUtils()
    • Method Detail

      • aggStable

        public static MatrixBlock aggStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in,
                                            org.apache.sysml.runtime.matrix.operators.AggregateOperator aop)
        Single block aggregation over pair rdds with corrections for numerical stability.
        Parameters:
        in - matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
        aop - aggregate operator
        Returns:
        matrix block
      • aggStable

        public static MatrixBlock aggStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in,
                                            org.apache.sysml.runtime.matrix.operators.AggregateOperator aop)
        Single block aggregation over rdds with corrections for numerical stability.
        Parameters:
        in - matrix as JavaRDD<MatrixBlock>
        aop - aggregate operator
        Returns:
        matrix block
      • aggByKeyStable

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in,
                                                                                                      org.apache.sysml.runtime.matrix.operators.AggregateOperator aop)
      • aggByKeyStable

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in,
                                                                                                      org.apache.sysml.runtime.matrix.operators.AggregateOperator aop,
                                                                                                      boolean deepCopyCombiner)
      • aggByKeyStable

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in,
                                                                                                      org.apache.sysml.runtime.matrix.operators.AggregateOperator aop,
                                                                                                      int numPartitions,
                                                                                                      boolean deepCopyCombiner)
      • mergeByKey

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
        Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.
        Parameters:
        in - matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
        Returns:
        matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
      • mergeByKey

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in,
                                                                                                  boolean deepCopyCombiner)
        Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.
        Parameters:
        in - matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
        deepCopyCombiner - indicator if the createCombiner functions needs to deep copy the input block
        Returns:
        matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
      • mergeByKey

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in,
                                                                                                  int numPartitions,
                                                                                                  boolean deepCopyCombiner)
        Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.
        Parameters:
        in - matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
        numPartitions - number of output partitions
        deepCopyCombiner - indicator if the createCombiner functions needs to deep copy the input block
        Returns:
        matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
      • mergeRowsByKey

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeRowsByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,org.apache.sysml.runtime.instructions.spark.data.RowMatrixBlock> in)
        Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.
        Parameters:
        in - matrix as JavaPairRDD<MatrixIndexes, RowMatrixBlock>
        Returns:
        matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>

Copyright © 2017 The Apache Software Foundation. All rights reserved.