JCudaObject (SystemML 0.14.0 API)

java.lang.Object
- org.apache.sysml.runtime.instructions.gpu.context.GPUObject
- - org.apache.sysml.runtime.instructions.gpu.context.JCudaObject

public class JCudaObject
extends GPUObject

Handle to a matrix block on the GPU

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`JCudaObject.CSRPointer` Compressed Sparse Row (CSR) format for CUDA Generalized matrix multiply is implemented for CSR format in the cuSparse library among other operations

Nested classes/interfaces inherited from class org.apache.sysml.runtime.instructions.gpu.context.GPUObject
GPUObject.EvictionPolicy

Field Summary

Fields
Modifier and Type	Field and Description
`jcuda.Pointer`	`jcudaDenseMatrixPtr` Pointer to dense matrix
`JCudaObject.CSRPointer`	`jcudaSparseMatrixPtr` Pointer to sparse matrix
`long`	`numBytes`

Fields inherited from class org.apache.sysml.runtime.instructions.gpu.context.GPUObject
evictionPolicy, isDeviceCopyModified, isInSparseFormat, mat, numLocks

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`acquireDeviceModifyDense()` To signal intent that a matrix block will be written to on the GPU
`boolean`	`acquireDeviceModifySparse()` To signal intent that a sparse matrix block will be written to on the GPU
`boolean`	`acquireDeviceRead()` Signal intent that a matrix block will be read (as input) on the GPU
`boolean`	`acquireHostRead()` Signal intent that a block needs to be read on the host
`static jcuda.Pointer`	`allocate(long size)` Convenience method for `allocate(String, long, int)`, defaults statsCount to 1.
`static jcuda.Pointer`	`allocate(String instructionName, long size)` Convenience method for `allocate(String, long, int)`, defaults statsCount to 1.
`static jcuda.Pointer`	`allocate(String instructionName, long size, int statsCount)` Allocates temporary space on the device.
`void`	`allocateAndFillDense(double v)` Allocates a dense matrix of size obtained from the attached matrix metadata and fills it up with a single value
`void`	`allocateSparseAndEmpty()` Allocates a sparse and empty `JCudaObject` This is the result of operations that are both non zero matrices.
`jcuda.jcudnn.cudnnTensorDescriptor`	`allocateTensorDescriptor(int N, int C, int H, int W)` Returns a previously allocated or allocates and returns a tensor descriptor
`static JCudaObject.CSRPointer`	`columnMajorDenseToRowMajorSparse(jcuda.jcusparse.cusparseHandle cusparseHandle, int rows, int cols, jcuda.Pointer densePtr)` Convenience method to convert a CSR matrix to a dense matrix on the GPU Since the allocated matrix is temporary, bookkeeping is not updated.
`protected void`	`copyFromDeviceToHost()` Copies a matrix block (dense or sparse) from GPU Memory to Host memory.
`static void`	`cudaFreeHelper(jcuda.Pointer toFree)` Does lazy cudaFree calls
`static void`	`cudaFreeHelper(jcuda.Pointer toFree, boolean eager)` does lazy/eager cudaFree calls
`static void`	`cudaFreeHelper(String instructionName, jcuda.Pointer toFree)` Does lazy cudaFree calls
`static void`	`cudaFreeHelper(String instructionName, jcuda.Pointer toFree, boolean eager)` Does cudaFree calls, lazily
`static String`	`debugString(jcuda.Pointer A, long rows, long cols)` Gets the double array from GPU memory onto host memory and returns string.
`void`	`denseToSparse()` Converts this JCudaObject from dense to sparse format.
`protected long`	`getSizeOnDevice()`
`JCudaObject.CSRPointer`	`getSparseMatrixCudaPointer()` Convenience method to directly examine the Sparse matrix on GPU
`jcuda.jcudnn.cudnnTensorDescriptor`	`getTensorDescriptor()` Returns a previously allocated tensor descriptor or null
`int[]`	`getTensorShape()` Returns a previously allocated tensor shape or null
`boolean`	`isAllocated()`
`boolean`	`isSparseAndEmpty()` If this `JCudaObject` is sparse and empty Being allocated is a prerequisite to being sparse and empty.
`void`	`releaseInput()` releases input allocated on GPU
`void`	`releaseOutput()` releases output allocated on GPU
`void`	`setDenseMatrixCudaPointer(jcuda.Pointer densePtr)` Convenience method to directly set the dense matrix pointer on GPU Make sure to call `setDeviceModify(long)` after this to set appropriate state, if you are not sure what you are doing.
`void`	`setDeviceModify(long numBytes)` If memory on GPU has been allocated from elsewhere, this method updates the internal bookkeeping
`void`	`setSparseMatrixCudaPointer(JCudaObject.CSRPointer sparseMatrixPtr)` Convenience method to directly set the sparse matrix on GPU Make sure to call `setDeviceModify(long)` after this to set appropriate state, if you are not sure what you are doing.
`void`	`sparseToColumnMajorDense()` More efficient method to convert sparse to dense but returns dense in column major format
`void`	`sparseToDense()` Convert sparse to dense (Performs transpose, use sparseToColumnMajorDense if the kernel can deal with column major format)
`void`	`sparseToDense(String instructionName)` Convert sparse to dense (Performs transpose, use sparseToColumnMajorDense if the kernel can deal with column major format) Also records per instruction invokation of sparseToDense.
`static int`	`toIntExact(long l)`
`static jcuda.Pointer`	`transpose(jcuda.Pointer densePtr, int m, int n, int lda, int ldc)` Transposes a dense matrix on the GPU by calling the cublasDgeam operation

Methods inherited from class org.apache.sysml.runtime.instructions.gpu.context.GPUObject
clearData, clearData, evict, evict, getAvailableMemory, isInSparseFormat

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - jcudaDenseMatrixPtr
```
public jcuda.Pointer jcudaDenseMatrixPtr
```
    Pointer to dense matrix
  - jcudaSparseMatrixPtr
```
public JCudaObject.CSRPointer jcudaSparseMatrixPtr
```
    Pointer to sparse matrix
  - numBytes
```
public long numBytes
```
- Method Detail
  - getTensorShape
```
public int[] getTensorShape()
```
    Returns a previously allocated tensor shape or null
    
    Returns:
    
    int array of four elements or null
  - getTensorDescriptor
```
public jcuda.jcudnn.cudnnTensorDescriptor getTensorDescriptor()
```
    Returns a previously allocated tensor descriptor or null
    
    Returns:
    
    cudnn tensor descriptor
  - allocateTensorDescriptor
```
public jcuda.jcudnn.cudnnTensorDescriptor allocateTensorDescriptor(int N,
                                                                   int C,
                                                                   int H,
                                                                   int W)
```
    Returns a previously allocated or allocates and returns a tensor descriptor
    
    Parameters:
    
    N - number of images
    
    C - number of channels
    
    H - height
    
    W - width
    
    Returns:
    
    cudnn tensor descriptor
  - isAllocated
```
public boolean isAllocated()
```
    Specified by:
    
    isAllocated in class GPUObject
  - allocateSparseAndEmpty
```
public void allocateSparseAndEmpty()
                            throws DMLRuntimeException
```
    Allocates a sparse and empty JCudaObject This is the result of operations that are both non zero matrices.
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - allocateAndFillDense
```
public void allocateAndFillDense(double v)
                          throws DMLRuntimeException
```
    Allocates a dense matrix of size obtained from the attached matrix metadata and fills it up with a single value
    
    Parameters:
    
    v - value to fill up the dense matrix
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - isSparseAndEmpty
```
public boolean isSparseAndEmpty()
```
    If this JCudaObject is sparse and empty Being allocated is a prerequisite to being sparse and empty.
    
    Returns:
    
    true if sparse and empty
  - acquireDeviceRead
```
public boolean acquireDeviceRead()
                          throws DMLRuntimeException
```
    Description copied from class: GPUObject
    
    Signal intent that a matrix block will be read (as input) on the GPU
    
    Specified by:
    
    acquireDeviceRead in class GPUObject
    
    Returns:
    
    true if a host memory to device memory transfer happened
    
    Throws:
    
    DMLRuntimeException - ?
  - acquireDeviceModifyDense
```
public boolean acquireDeviceModifyDense()
                                 throws DMLRuntimeException
```
    Description copied from class: GPUObject
    
    To signal intent that a matrix block will be written to on the GPU
    
    Specified by:
    
    acquireDeviceModifyDense in class GPUObject
    
    Returns:
    
    true if memory was allocated on the GPU as a result of this call
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - acquireDeviceModifySparse
```
public boolean acquireDeviceModifySparse()
                                  throws DMLRuntimeException
```
    Description copied from class: GPUObject
    
    To signal intent that a sparse matrix block will be written to on the GPU
    
    Specified by:
    
    acquireDeviceModifySparse in class GPUObject
    
    Returns:
    
    true if memory was allocated on the GPU as a result of this call
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - acquireHostRead
```
public boolean acquireHostRead()
                        throws org.apache.sysml.runtime.controlprogram.caching.CacheException
```
    Description copied from class: GPUObject
    
    Signal intent that a block needs to be read on the host
    
    Specified by:
    
    acquireHostRead in class GPUObject
    
    Returns:
    
    true if copied from device to host
    
    Throws:
    
    org.apache.sysml.runtime.controlprogram.caching.CacheException - ?
  - releaseInput
```
public void releaseInput()
                  throws org.apache.sysml.runtime.controlprogram.caching.CacheException
```
    releases input allocated on GPU
    
    Specified by:
    
    releaseInput in class GPUObject
    
    Throws:
    
    org.apache.sysml.runtime.controlprogram.caching.CacheException - if data is not allocated
  - releaseOutput
```
public void releaseOutput()
                   throws org.apache.sysml.runtime.controlprogram.caching.CacheException
```
    releases output allocated on GPU
    
    Specified by:
    
    releaseOutput in class GPUObject
    
    Throws:
    
    org.apache.sysml.runtime.controlprogram.caching.CacheException - if data is not allocated
  - setDeviceModify
```
public void setDeviceModify(long numBytes)
```
    Description copied from class: GPUObject
    
    If memory on GPU has been allocated from elsewhere, this method updates the internal bookkeeping
    
    Specified by:
    
    setDeviceModify in class GPUObject
    
    Parameters:
    
    numBytes - number of bytes
  - toIntExact
```
public static int toIntExact(long l)
                      throws DMLRuntimeException
```
    Throws:
    
    DMLRuntimeException
  - copyFromDeviceToHost
```
protected void copyFromDeviceToHost()
                             throws DMLRuntimeException
```
    Description copied from class: GPUObject
    
    Copies a matrix block (dense or sparse) from GPU Memory to Host memory. A MatrixBlock instance is allocated, data from the GPU is copied in, the current one in Host memory is deallocated by calling MatrixObject's acquireHostModify(MatrixBlock) (??? does not exist) and overwritten with the newly allocated instance. TODO : re-examine this to avoid spurious allocations of memory for optimizations
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - getSizeOnDevice
```
protected long getSizeOnDevice()
                        throws DMLRuntimeException
```
    Throws:
    
    DMLRuntimeException
  - getSparseMatrixCudaPointer
```
public JCudaObject.CSRPointer getSparseMatrixCudaPointer()
```
    Convenience method to directly examine the Sparse matrix on GPU
    
    Returns:
    
    CSR (compressed sparse row) pointer
  - setSparseMatrixCudaPointer
```
public void setSparseMatrixCudaPointer(JCudaObject.CSRPointer sparseMatrixPtr)
```
    Convenience method to directly set the sparse matrix on GPU Make sure to call setDeviceModify(long) after this to set appropriate state, if you are not sure what you are doing. Needed for operations like JCusparse.cusparseDcsrgemm(cusparseHandle, int, int, int, int, int, cusparseMatDescr, int, Pointer, Pointer, Pointer, cusparseMatDescr, int, Pointer, Pointer, Pointer, cusparseMatDescr, Pointer, Pointer, Pointer)
    
    Parameters:
    
    sparseMatrixPtr - CSR (compressed sparse row) pointer
  - setDenseMatrixCudaPointer
```
public void setDenseMatrixCudaPointer(jcuda.Pointer densePtr)
```
    Convenience method to directly set the dense matrix pointer on GPU Make sure to call setDeviceModify(long) after this to set appropriate state, if you are not sure what you are doing.
    
    Parameters:
    
    densePtr - dense pointer
  - denseToSparse
```
public void denseToSparse()
                   throws DMLRuntimeException
```
    Converts this JCudaObject from dense to sparse format.
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - transpose
```
public static jcuda.Pointer transpose(jcuda.Pointer densePtr,
                                      int m,
                                      int n,
                                      int lda,
                                      int ldc)
                               throws DMLRuntimeException
```
    Transposes a dense matrix on the GPU by calling the cublasDgeam operation
    
    Parameters:
    
    densePtr - Pointer to dense matrix on the GPU
    
    m - rows in ouput matrix
    
    n - columns in output matrix
    
    lda - rows in input matrix
    
    ldc - columns in output matrix
    
    Returns:
    
    transposed matrix
    
    Throws:
    
    DMLRuntimeException - if operation failed
  - sparseToDense
```
public void sparseToDense()
                   throws DMLRuntimeException
```
    Convert sparse to dense (Performs transpose, use sparseToColumnMajorDense if the kernel can deal with column major format)
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - sparseToDense
```
public void sparseToDense(String instructionName)
                   throws DMLRuntimeException
```
    Convert sparse to dense (Performs transpose, use sparseToColumnMajorDense if the kernel can deal with column major format) Also records per instruction invokation of sparseToDense.
    
    Parameters:
    
    instructionName - Name of the instruction for which statistics are recorded in GPUStatistics
    
    Throws:
    
    DMLRuntimeException - ?
  - sparseToColumnMajorDense
```
public void sparseToColumnMajorDense()
                              throws DMLRuntimeException
```
    More efficient method to convert sparse to dense but returns dense in column major format
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - columnMajorDenseToRowMajorSparse
```
public static JCudaObject.CSRPointer columnMajorDenseToRowMajorSparse(jcuda.jcusparse.cusparseHandle cusparseHandle,
                                                                      int rows,
                                                                      int cols,
                                                                      jcuda.Pointer densePtr)
                                                               throws DMLRuntimeException
```
    Convenience method to convert a CSR matrix to a dense matrix on the GPU Since the allocated matrix is temporary, bookkeeping is not updated. Also note that the input dense matrix is expected to be in COLUMN MAJOR FORMAT Caller is responsible for deallocating memory on GPU.
    
    Parameters:
    
    cusparseHandle - handle to cusparse library
    
    rows - number of rows
    
    cols - number of columns
    
    densePtr - [in] dense matrix pointer on the GPU in row major
    
    Returns:
    
    CSR (compressed sparse row) pointer
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - allocate
```
public static jcuda.Pointer allocate(long size)
                              throws DMLRuntimeException
```
    Convenience method for allocate(String, long, int), defaults statsCount to 1.
    
    Parameters:
    
    size - size of data (in bytes) to allocate
    
    Returns:
    
    jcuda pointer
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - allocate
```
public static jcuda.Pointer allocate(String instructionName,
                                     long size)
                              throws DMLRuntimeException
```
    Convenience method for allocate(String, long, int), defaults statsCount to 1.
    
    Parameters:
    
    instructionName - name of instruction for which to record per instruction performance statistics, null if don't want to record
    
    size - size of data (in bytes) to allocate
    
    Returns:
    
    jcuda pointer
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - allocate
```
public static jcuda.Pointer allocate(String instructionName,
                                     long size,
                                     int statsCount)
                              throws DMLRuntimeException
```
    Allocates temporary space on the device. Does not update bookkeeping. The caller is responsible for freeing up after usage.
    
    Parameters:
    
    instructionName - name of instruction for which to record per instruction performance statistics, null if don't want to record
    
    size - Size of data (in bytes) to allocate
    
    statsCount - amount to increment the cudaAllocCount by
    
    Returns:
    
    jcuda Pointer
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs
  - cudaFreeHelper
```
public static void cudaFreeHelper(jcuda.Pointer toFree)
```
    Does lazy cudaFree calls
    
    Parameters:
    
    toFree - Pointer instance to be freed
  - cudaFreeHelper
```
public static void cudaFreeHelper(jcuda.Pointer toFree,
                                  boolean eager)
```
    does lazy/eager cudaFree calls
    
    Parameters:
    
    toFree - Pointer instance to be freed
    
    eager - true if to be done eagerly
  - cudaFreeHelper
```
public static void cudaFreeHelper(String instructionName,
                                  jcuda.Pointer toFree)
```
    Does lazy cudaFree calls
    
    Parameters:
    
    instructionName - name of the instruction for which to record per instruction free time, null if do not want to record
    
    toFree - Pointer instance to be freed
  - cudaFreeHelper
```
public static void cudaFreeHelper(String instructionName,
                                  jcuda.Pointer toFree,
                                  boolean eager)
```
    Does cudaFree calls, lazily
    
    Parameters:
    
    instructionName - name of the instruction for which to record per instruction free time, null if do not want to record
    
    toFree - Pointer instance to be freed
    
    eager - true if to be done eagerly
  - debugString
```
public static String debugString(jcuda.Pointer A,
                                 long rows,
                                 long cols)
                          throws DMLRuntimeException
```
    Gets the double array from GPU memory onto host memory and returns string.
    
    Parameters:
    
    A - Pointer to memory on device (GPU), assumed to point to a double array
    
    rows - rows in matrix A
    
    cols - columns in matrix A
    
    Returns:
    
    the debug string
    
    Throws:
    
    DMLRuntimeException - if DMLRuntimeException occurs

Nested classes/interfaces inherited from class org.apache.sysml.runtime.instructions.gpu.context.GPUObject

Fields inherited from class org.apache.sysml.runtime.instructions.gpu.context.GPUObject

Methods inherited from class org.apache.sysml.runtime.instructions.gpu.context.GPUObject

Methods inherited from class java.lang.Object

jcudaDenseMatrixPtr

jcudaSparseMatrixPtr

numBytes

getTensorShape

getTensorDescriptor

allocateTensorDescriptor

isAllocated

allocateSparseAndEmpty

allocateAndFillDense

isSparseAndEmpty

acquireDeviceRead

acquireDeviceModifyDense

acquireDeviceModifySparse

acquireHostRead

releaseInput

releaseOutput

setDeviceModify

toIntExact

copyFromDeviceToHost

getSizeOnDevice

getSparseMatrixCudaPointer

setSparseMatrixCudaPointer

setDenseMatrixCudaPointer

denseToSparse

transpose

sparseToDense

sparseToDense

sparseToColumnMajorDense

columnMajorDenseToRowMajorSparse

allocate

allocate

allocate

cudaFreeHelper

cudaFreeHelper

cudaFreeHelper

cudaFreeHelper

debugString