Package | Description |
---|---|
org.apache.sysml.runtime.instructions.gpu.context | |
org.apache.sysml.runtime.matrix.data |
Modifier and Type | Method and Description |
---|---|
static List<GPUContext> |
GPUContextPool.reserveAllGPUContexts()
Reserves and gets an initialized list of GPUContexts
|
Modifier and Type | Method and Description |
---|---|
static CSRPointer |
CSRPointer.allocateEmpty(GPUContext gCtx,
long nnz2,
long rows)
Factory method to allocate an empty CSR Sparse matrix on the GPU
|
static CSRPointer |
CSRPointer.allocateForDgeam(GPUContext gCtx,
jcuda.jcusparse.cusparseHandle handle,
CSRPointer A,
CSRPointer B,
int m,
int n)
Estimates the number of non zero elements from the results of a sparse cusparseDgeam operation
C = a op(A) + b op(B)
|
static CSRPointer |
CSRPointer.allocateForMatrixMultiply(GPUContext gCtx,
jcuda.jcusparse.cusparseHandle handle,
CSRPointer A,
int transA,
CSRPointer B,
int transB,
int m,
int n,
int k)
Estimates the number of non-zero elements from the result of a sparse matrix multiplication C = A * B
and returns the
CSRPointer to C with the appropriate GPU memory. |
static CSRPointer |
GPUObject.columnMajorDenseToRowMajorSparse(GPUContext gCtx,
jcuda.jcusparse.cusparseHandle cusparseHandle,
jcuda.Pointer densePtr,
int rows,
int cols)
Convenience method to convert a CSR matrix to a dense matrix on the GPU
Since the allocated matrix is temporary, bookkeeping is not updated.
|
static jcuda.Pointer |
GPUObject.transpose(GPUContext gCtx,
jcuda.Pointer densePtr,
int m,
int n,
int lda,
int ldc)
Transposes a dense matrix on the GPU by calling the cublasDgeam operation
|
Modifier and Type | Method and Description |
---|---|
static void |
LibMatrixCUDA.abs(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "abs" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.acos(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "acos" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.asin(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "asin" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.atan(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "atan" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.axpy(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in2,
String outputName,
double constant)
Performs daxpy operation
|
static void |
LibMatrixCUDA.batchNormalizationBackward(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject dout,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject scale,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject ret,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject retScale,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject retBias,
double epsilon)
This method computes the backpropagation errors for image, scale and bias of batch normalization layer
|
static void |
LibMatrixCUDA.batchNormalizationForwardInference(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject scale,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject bias,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject runningMean,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject runningVar,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject ret,
double epsilon)
Performs the forward BatchNormalization layer computation for inference
|
static void |
LibMatrixCUDA.batchNormalizationForwardTraining(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject scale,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject bias,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject runningMean,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject runningVar,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject ret,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject retRunningMean,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject retRunningVar,
double epsilon,
double exponentialAverageFactor)
Performs the forward BatchNormalization layer computation for training
|
static void |
LibMatrixCUDA.biasAdd(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject input,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject bias,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock)
Performs the operation corresponding to the DML script:
ones = matrix(1, rows=1, cols=Hout*Wout)
output = input + matrix(bias %*% ones, rows=1, cols=F*Hout*Wout)
This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in function
|
static void |
LibMatrixCUDA.biasMultiply(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject input,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject bias,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock)
Performs the operation corresponding to the DML script:
ones = matrix(1, rows=1, cols=Hout*Wout)
output = input * matrix(bias %*% ones, rows=1, cols=F*Hout*Wout)
This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in function
|
static void |
LibMatrixCUDA.cbind(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in2,
String outputName) |
static void |
LibMatrixCUDA.ceil(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "ceil" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.conv2d(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject filter,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q) |
static void |
LibMatrixCUDA.conv2d(GPUContext gCtx,
String instName,
jcuda.Pointer image,
jcuda.Pointer filter,
jcuda.Pointer output,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q)
Performs 2D convolution
Takes up an insignificant amount of intermediate space when CONVOLUTION_PREFERENCE is set to CUDNN_CONVOLUTION_FWD_NO_WORKSPACE
Intermediate space is required by the filter descriptor and convolution descriptor which are metadata structures and don't scale with the size of the input
|
static void |
LibMatrixCUDA.conv2dBackwardData(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject filter,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject dout,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject output,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q)
This method computes the backpropogation errors for previous layer of convolution operation
|
static void |
LibMatrixCUDA.conv2dBackwardFilter(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject dout,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q)
This method computes the backpropogation errors for filter of convolution operation
|
static void |
LibMatrixCUDA.conv2dBiasAdd(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject bias,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject filter,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject output,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q)
Does a 2D convolution followed by a bias_add
|
static void |
LibMatrixCUDA.cos(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "cos" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.denseDenseMatmult(GPUContext gCtx,
String instName,
jcuda.Pointer output,
int leftRows1,
int leftCols1,
int rightRows1,
int rightCols1,
boolean isLeftTransposed1,
boolean isRightTransposed1,
jcuda.Pointer leftPtr,
jcuda.Pointer rightPtr)
Dense-dense matrix multiply
C = op(A) * op(B), A and B are dense matrices
On the host, the matrices are in row-major format; cuBLAS expects them in column-major format.
|
static void |
LibMatrixCUDA.exp(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "exp" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.floor(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "floor" operation on a matrix on the GPU
|
static boolean |
LibMatrixCUDA.isInSparseFormat(GPUContext gCtx,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject mo) |
static void |
LibMatrixCUDA.log(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "log" operation on a matrix on the GPU
|
static org.apache.sysml.runtime.controlprogram.caching.MatrixObject |
LibMatrixCUDA.matmult(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject left,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject right,
String outputName,
boolean isLeftTransposed,
boolean isRightTransposed)
Matrix multiply on GPU
Examines sparsity and shapes and routes call to appropriate method
from cuBLAS or cuSparse
C = op(A) x op(B)
|
static void |
LibMatrixCUDA.matmultTSMM(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject left,
String outputName,
boolean isLeftTransposed)
Performs tsmm, A %*% A' or A' %*% A, on GPU by exploiting cublasDsyrk(...)
|
static void |
LibMatrixCUDA.matrixMatrixArithmetic(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in2,
String outputName,
boolean isLeftTransposed,
boolean isRightTransposed,
org.apache.sysml.runtime.matrix.operators.BinaryOperator op)
Performs elementwise arithmetic operation specified by op of two input matrices in1 and in2
|
static void |
LibMatrixCUDA.matrixMatrixRelational(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in2,
String outputName,
org.apache.sysml.runtime.matrix.operators.BinaryOperator op)
Performs elementwise operation relational specified by op of two input matrices in1 and in2
|
static void |
LibMatrixCUDA.matrixScalarArithmetic(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in,
String outputName,
boolean isInputTransposed,
org.apache.sysml.runtime.matrix.operators.ScalarOperator op)
Entry point to perform elementwise matrix-scalar arithmetic operation specified by op
|
static void |
LibMatrixCUDA.matrixScalarRelational(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in,
String outputName,
org.apache.sysml.runtime.matrix.operators.ScalarOperator op)
Entry point to perform elementwise matrix-scalar relational operation specified by op
|
static void |
LibMatrixCUDA.maxpooling(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q)
performs maxpooling on GPU by exploiting cudnnPoolingForward(...)
|
static void |
LibMatrixCUDA.maxpoolingBackward(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject image,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject dout,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q)
Performs maxpoolingBackward on GPU by exploiting cudnnPoolingBackward(...)
This method computes the backpropogation errors for previous layer of maxpooling operation
|
static void |
LibMatrixCUDA.performMaxpooling(GPUContext gCtx,
String instName,
jcuda.Pointer x,
jcuda.jcudnn.cudnnTensorDescriptor xDesc,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock,
int N,
int C,
int H,
int W,
int K,
int R,
int S,
int pad_h,
int pad_w,
int stride_h,
int stride_w,
int P,
int Q) |
static void |
LibMatrixCUDA.rbind(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in2,
String outputName) |
static void |
LibMatrixCUDA.relu(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in,
String outputName)
Performs the relu operation on the GPU.
|
static void |
LibMatrixCUDA.reluBackward(GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject input,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject dout,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject outputBlock)
This method computes the backpropagation errors for previous layer of relu operation
|
static void |
LibMatrixCUDA.round(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "round" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.sign(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "sign" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.sin(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "sin" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.sliceOperations(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
org.apache.sysml.runtime.util.IndexRange ixrange,
String outputName)
Method to perform rangeReIndex operation for a given lower and upper bounds in row and column dimensions.
|
static void |
LibMatrixCUDA.solve(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in2,
String outputName)
Implements the "solve" function for systemml Ax = B (A is of size m*n, B is of size m*1, x is of size n*1)
|
static void |
LibMatrixCUDA.sqrt(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "sqrt" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.tan(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String outputName)
Performs an "tan" operation on a matrix on the GPU
|
static void |
LibMatrixCUDA.transpose(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in,
String outputName)
Transposes the input matrix using cublasDgeam
|
static void |
LibMatrixCUDA.unaryAggregate(org.apache.sysml.runtime.controlprogram.context.ExecutionContext ec,
GPUContext gCtx,
String instName,
org.apache.sysml.runtime.controlprogram.caching.MatrixObject in1,
String output,
org.apache.sysml.runtime.matrix.operators.AggregateUnaryOperator op)
Entry point to perform Unary aggregate operations on the GPU.
|
Copyright © 2017 The Apache Software Foundation. All rights reserved.