RDDConverterUtilsExt (SystemML 0.13.0 API)

java.lang.Object
- org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt

```
public class RDDConverterUtilsExt
extends Object
```
NOTE: These are experimental converter utils. Once thoroughly tested, they can be moved to RDDConverterUtils.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class RDDConverterUtilsExt.AddRowID

static class RDDConverterUtilsExt.RDDConverterTypes

Nested Classes
Modifier and Type	Class and Description
`static class`	`RDDConverterUtilsExt.AddRowID`
`static class`	`RDDConverterUtilsExt.RDDConverterTypes`

Constructor Summary

Constructors
Constructor and Description

RDDConverterUtilsExt()

Constructors
Constructor and Description
`RDDConverterUtilsExt()`

Method Summary

All Methods Static Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`addIDToDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, org.apache.spark.sql.SparkSession sparkSession, String nameOfCol)` Add element indices as new column to DataFrame
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`addIDToDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, org.apache.spark.sql.SQLContext sqlContext, String nameOfCol)` Deprecated. This will be removed in SystemML 1.0.
`static byte[]`	`convertMBtoPy4JDenseArr(MatrixBlock mb)`
`static MatrixBlock`	`convertPy4JArrayToMB(byte[] data, int rlen, int clen)`
`static MatrixBlock`	`convertPy4JArrayToMB(byte[] data, int rlen, int clen, boolean isSparse)`
`static MatrixBlock`	`convertPy4JArrayToMB(byte[] data, long rlen, long clen)`
`static MatrixBlock`	`convertPy4JArrayToMB(byte[] data, long rlen, long clen, boolean isSparse)`
`static MatrixBlock`	`convertSciPyCOOToMB(byte[] data, byte[] row, byte[] col, int rlen, int clen, int nnz)`
`static MatrixBlock`	`convertSciPyCOOToMB(byte[] data, byte[] row, byte[] col, long rlen, long clen, long nnz)`
`static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>`	`coordinateMatrixToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input, MatrixCharacteristics mcIn, boolean outputEmptyBlocks)` Example usage:
`static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>`	`coordinateMatrixToBinaryBlock(org.apache.spark.SparkContext sc, org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input, MatrixCharacteristics mcIn, boolean outputEmptyBlocks)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`projectColumns(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, ArrayList<String> columns)`
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`stringDataFrameToVectorDataFrame(org.apache.spark.sql.SparkSession sparkSession, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> inputDF)` Convert a dataframe of comma-separated string rows to a dataframe of ml.linalg.Vector rows.
`static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`stringDataFrameToVectorDataFrame(org.apache.spark.sql.SQLContext sqlContext, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> inputDF)` Deprecated. This will be removed in SystemML 1.0. Please migrate to `RDDConverterUtilsExt.stringDataFrameToVectorDataFrame(SparkSession, Dataset<Row>)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- RDDConverterUtilsExt
```
public RDDConverterUtilsExt()
```

Method Detail

coordinateMatrixToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> coordinateMatrixToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                             org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input,
                                                                                                             MatrixCharacteristics mcIn,
                                                                                                             boolean outputEmptyBlocks)
                                                                                                      throws DMLRuntimeException

Example usage:


 import org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt
 import org.apache.sysml.runtime.matrix.MatrixCharacteristics
 import org.apache.spark.api.java.JavaSparkContext
 import org.apache.spark.mllib.linalg.distributed.MatrixEntry
 import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix
 val matRDD = sc.textFile("ratings.text").map(_.split(" ")).map(x => new MatrixEntry(x(0).toLong, x(1).toLong, x(2).toDouble)).filter(_.value != 0).cache
 require(matRDD.filter(x => x.i == 0 || x.j == 0).count == 0, "Expected 1-based ratings file")
 val nnz = matRDD.count
 val numRows = matRDD.map(_.i).max
 val numCols = matRDD.map(_.j).max
 val coordinateMatrix = new CoordinateMatrix(matRDD, numRows, numCols)
 val mc = new MatrixCharacteristics(numRows, numCols, 1000, 1000, nnz)
 val binBlocks = RDDConverterUtilsExt.coordinateMatrixToBinaryBlock(new JavaSparkContext(sc), coordinateMatrix, mc, true)

Parameters:: sc - java spark context; input - coordinate matrix; mcIn - matrix characteristics; outputEmptyBlocks - if true, inject empty blocks if necessary
Returns:: matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
Throws:: DMLRuntimeException - if DMLRuntimeException occurs

coordinateMatrixToBinaryBlock

public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> coordinateMatrixToBinaryBlock(org.apache.spark.SparkContext sc,
                                                                                                             org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input,
                                                                                                             MatrixCharacteristics mcIn,
                                                                                                             boolean outputEmptyBlocks)
                                                                                                      throws DMLRuntimeException

Throws:: DMLRuntimeException

projectColumns

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> projectColumns(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df,
                                                                                    ArrayList<String> columns)
                                                                             throws DMLRuntimeException

Throws:: DMLRuntimeException

convertPy4JArrayToMB

public static MatrixBlock convertPy4JArrayToMB(byte[] data,
                                               long rlen,
                                               long clen)
                                        throws DMLRuntimeException

Throws:: DMLRuntimeException

convertPy4JArrayToMB

public static MatrixBlock convertPy4JArrayToMB(byte[] data,
                                               int rlen,
                                               int clen)
                                        throws DMLRuntimeException

Throws:: DMLRuntimeException

convertSciPyCOOToMB

public static MatrixBlock convertSciPyCOOToMB(byte[] data,
                                              byte[] row,
                                              byte[] col,
                                              long rlen,
                                              long clen,
                                              long nnz)
                                       throws DMLRuntimeException

Throws:: DMLRuntimeException

convertSciPyCOOToMB

public static MatrixBlock convertSciPyCOOToMB(byte[] data,
                                              byte[] row,
                                              byte[] col,
                                              int rlen,
                                              int clen,
                                              int nnz)
                                       throws DMLRuntimeException

Throws:: DMLRuntimeException

convertPy4JArrayToMB

public static MatrixBlock convertPy4JArrayToMB(byte[] data,
                                               long rlen,
                                               long clen,
                                               boolean isSparse)
                                        throws DMLRuntimeException

Throws:: DMLRuntimeException

convertPy4JArrayToMB

public static MatrixBlock convertPy4JArrayToMB(byte[] data,
                                               int rlen,
                                               int clen,
                                               boolean isSparse)
                                        throws DMLRuntimeException

Throws:: DMLRuntimeException

convertMBtoPy4JDenseArr

public static byte[] convertMBtoPy4JDenseArr(MatrixBlock mb)
                                      throws DMLRuntimeException

Throws:: DMLRuntimeException

addIDToDataFrame

public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> addIDToDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df,
                                                                                      org.apache.spark.sql.SparkSession sparkSession,
                                                                                      String nameOfCol)

Add element indices as new column to DataFrame

Parameters:: df - input data frame; sparkSession - the Spark Session; nameOfCol - name of index column
Returns:: new data frame

addIDToDataFrame

@Deprecated
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> addIDToDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df,
                                                                                                  org.apache.spark.sql.SQLContext sqlContext,
                                                                                                  String nameOfCol)

Deprecated. This will be removed in SystemML 1.0.

Add element indices as new column to DataFrame

Parameters:: df - input data frame; sqlContext - the SQL Context; nameOfCol - name of index column
Returns:: new data frame

stringDataFrameToVectorDataFrame
```
@Deprecated
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> stringDataFrameToVectorDataFrame(org.apache.spark.sql.SQLContext sqlContext,
                                                                                                                  org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> inputDF)
                                                                                                           throws DMLRuntimeException
```
Deprecated. This will be removed in SystemML 1.0. Please migrate to RDDConverterUtilsExt.stringDataFrameToVectorDataFrame(SparkSession, Dataset<Row>)

Convert a dataframe of comma-separated string rows to a dataframe of ml.linalg.Vector rows.
Example input rows:
((1.2, 4.3, 3.4)) (1.2, 3.4, 2.2) [[1.2, 34.3, 1.2, 1.25]] [1.2, 3.4]

Parameters:

sqlContext - Spark SQL Context

inputDF - dataframe of comma-separated row strings to convert to dataframe of ml.linalg.Vector rows

Returns:

dataframe of ml.linalg.Vector rows

Throws:

DMLRuntimeException - if DMLRuntimeException occurs

stringDataFrameToVectorDataFrame
```
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> stringDataFrameToVectorDataFrame(org.apache.spark.sql.SparkSession sparkSession,
                                                                                                      org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> inputDF)
                                                                                               throws DMLRuntimeException
```
Convert a dataframe of comma-separated string rows to a dataframe of ml.linalg.Vector rows.
Example input rows:
((1.2, 4.3, 3.4)) (1.2, 3.4, 2.2) [[1.2, 34.3, 1.2, 1.25]] [1.2, 3.4]

Parameters:

sparkSession - Spark Session

inputDF - dataframe of comma-separated row strings to convert to dataframe of ml.linalg.Vector rows

Returns:

dataframe of ml.linalg.Vector rows

Throws:

DMLRuntimeException - if DMLRuntimeException occurs

Class RDDConverterUtilsExt

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

RDDConverterUtilsExt

Method Detail

coordinateMatrixToBinaryBlock

coordinateMatrixToBinaryBlock

projectColumns

convertPy4JArrayToMB

convertPy4JArrayToMB

convertSciPyCOOToMB

convertSciPyCOOToMB

convertPy4JArrayToMB

convertPy4JArrayToMB

convertMBtoPy4JDenseArr

addIDToDataFrame

addIDToDataFrame

stringDataFrameToVectorDataFrame

stringDataFrameToVectorDataFrame