org.apache.sysml.runtime.instructions.spark.utils

Class FrameRDDConverterUtils

  • java.lang.Object
    • org.apache.sysml.runtime.instructions.spark.utils.FrameRDDConverterUtils


  • public class FrameRDDConverterUtils
    extends Object
    • Constructor Detail

      • FrameRDDConverterUtils

        public FrameRDDConverterUtils()
    • Method Detail

      • csvToBinaryBlock

        public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> csvToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                              org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> input,
                                                                                              MatrixCharacteristics mc,
                                                                                              org.apache.sysml.parser.Expression.ValueType[] schema,
                                                                                              boolean hasHeader,
                                                                                              String delim,
                                                                                              boolean fill,
                                                                                              double fillValue)
                                                                                       throws DMLRuntimeException
        Throws:
        DMLRuntimeException
      • textCellToBinaryBlock

        public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> textCellToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                   org.apache.spark.api.java.JavaPairRDD<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> in,
                                                                                                   MatrixCharacteristics mcOut,
                                                                                                   org.apache.sysml.parser.Expression.ValueType[] schema)
                                                                                            throws DMLRuntimeException
        Throws:
        DMLRuntimeException
      • textCellToBinaryBlockLongIndex

        public static org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> textCellToBinaryBlockLongIndex(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                            org.apache.spark.api.java.JavaPairRDD<Long,org.apache.hadoop.io.Text> input,
                                                                                                            MatrixCharacteristics mc,
                                                                                                            org.apache.sysml.parser.Expression.ValueType[] schema)
                                                                                                     throws DMLRuntimeException
        Throws:
        DMLRuntimeException
      • binaryBlockToDataFrame

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> binaryBlockToDataFrame(org.apache.spark.sql.SparkSession sparkSession,
                                                                                                    org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in,
                                                                                                    MatrixCharacteristics mc,
                                                                                                    org.apache.sysml.parser.Expression.ValueType[] schema)
      • binaryBlockToDataFrame

        @Deprecated
        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> binaryBlockToDataFrame(org.apache.spark.sql.SQLContext sqlContext,
                                                                                                                 org.apache.spark.api.java.JavaPairRDD<Long,FrameBlock> in,
                                                                                                                 MatrixCharacteristics mc,
                                                                                                                 org.apache.sysml.parser.Expression.ValueType[] schema)
        Deprecated. 
      • convertFrameSchemaToDFSchema

        public static org.apache.spark.sql.types.StructType convertFrameSchemaToDFSchema(org.apache.sysml.parser.Expression.ValueType[] fschema,
                                                                                         boolean containsID)
        This function will convert Frame schema into DataFrame schema
        Parameters:
        fschema - frame schema
        containsID - true if contains ID column
        Returns:
        Spark StructType of StructFields representing schema
      • convertDFSchemaToFrameSchema

        public static int convertDFSchemaToFrameSchema(org.apache.spark.sql.types.StructType dfschema,
                                                       String[] colnames,
                                                       org.apache.sysml.parser.Expression.ValueType[] fschema,
                                                       boolean containsID)
        NOTE: regarding the support of vector columns, we make the following schema restriction: single vector column, which allows inference of the vector length without data access and covers the common case.
        Parameters:
        dfschema - schema as StructType
        colnames - column names
        fschema - array of SystemML ValueTypes
        containsID - if true, contains ID column
        Returns:
        0-based column index of vector column, -1 if no vector.
      • csvToRowRDD

        public static org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> csvToRowRDD(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                              String fnameIn,
                                                                                              String delim,
                                                                                              org.apache.sysml.parser.Expression.ValueType[] schema)
      • csvToRowRDD

        public static org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> csvToRowRDD(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                              org.apache.spark.api.java.JavaRDD<String> dataRdd,
                                                                                              String delim,
                                                                                              org.apache.sysml.parser.Expression.ValueType[] schema)

Copyright © 2017 The Apache Software Foundation. All rights reserved.