public class JCudaObject extends GPUObject
Modifier and Type | Class and Description |
---|---|
static class |
JCudaObject.CSRPointer
Compressed Sparse Row (CSR) format for CUDA
Generalized matrix multiply is implemented for CSR format in the cuSparse library among other operations
|
GPUObject.EvictionPolicy
Modifier and Type | Field and Description |
---|---|
jcuda.Pointer |
jcudaDenseMatrixPtr
Pointer to dense matrix
|
JCudaObject.CSRPointer |
jcudaSparseMatrixPtr
Pointer to sparse matrix
|
long |
numBytes |
evictionPolicy, isDeviceCopyModified, isInSparseFormat, mat, numLocks
Modifier and Type | Method and Description |
---|---|
boolean |
acquireDeviceModifyDense()
To signal intent that a matrix block will be written to on the GPU
|
boolean |
acquireDeviceModifySparse()
To signal intent that a sparse matrix block will be written to on the GPU
|
boolean |
acquireDeviceRead()
Signal intent that a matrix block will be read (as input) on the GPU
|
boolean |
acquireHostRead()
Signal intent that a block needs to be read on the host
|
static jcuda.Pointer |
allocate(long size)
Convenience method for
allocate(String, long, int) , defaults statsCount to 1. |
static jcuda.Pointer |
allocate(String instructionName,
long size)
Convenience method for
allocate(String, long, int) , defaults statsCount to 1. |
static jcuda.Pointer |
allocate(String instructionName,
long size,
int statsCount)
Allocates temporary space on the device.
|
void |
allocateAndFillDense(double v)
Allocates a dense matrix of size obtained from the attached matrix metadata
and fills it up with a single value
|
void |
allocateSparseAndEmpty()
Allocates a sparse and empty
JCudaObject
This is the result of operations that are both non zero matrices. |
jcuda.jcudnn.cudnnTensorDescriptor |
allocateTensorDescriptor(int N,
int C,
int H,
int W)
Returns a previously allocated or allocates and returns a tensor descriptor
|
static JCudaObject.CSRPointer |
columnMajorDenseToRowMajorSparse(jcuda.jcusparse.cusparseHandle cusparseHandle,
int rows,
int cols,
jcuda.Pointer densePtr)
Convenience method to convert a CSR matrix to a dense matrix on the GPU
Since the allocated matrix is temporary, bookkeeping is not updated.
|
protected void |
copyFromDeviceToHost()
Copies a matrix block (dense or sparse) from GPU Memory to Host memory.
|
static void |
cudaFreeHelper(jcuda.Pointer toFree)
Does lazy cudaFree calls
|
static void |
cudaFreeHelper(jcuda.Pointer toFree,
boolean eager)
does lazy/eager cudaFree calls
|
static void |
cudaFreeHelper(String instructionName,
jcuda.Pointer toFree)
Does lazy cudaFree calls
|
static void |
cudaFreeHelper(String instructionName,
jcuda.Pointer toFree,
boolean eager)
Does cudaFree calls, lazily
|
static String |
debugString(jcuda.Pointer A,
long rows,
long cols)
Gets the double array from GPU memory onto host memory and returns string.
|
void |
denseToSparse()
Converts this JCudaObject from dense to sparse format.
|
protected long |
getSizeOnDevice() |
JCudaObject.CSRPointer |
getSparseMatrixCudaPointer()
Convenience method to directly examine the Sparse matrix on GPU
|
jcuda.jcudnn.cudnnTensorDescriptor |
getTensorDescriptor()
Returns a previously allocated tensor descriptor or null
|
int[] |
getTensorShape()
Returns a previously allocated tensor shape or null
|
boolean |
isAllocated() |
boolean |
isSparseAndEmpty()
If this
JCudaObject is sparse and empty
Being allocated is a prerequisite to being sparse and empty. |
void |
releaseInput()
releases input allocated on GPU
|
void |
releaseOutput()
releases output allocated on GPU
|
void |
setDenseMatrixCudaPointer(jcuda.Pointer densePtr)
Convenience method to directly set the dense matrix pointer on GPU
Make sure to call
setDeviceModify(long) after this to set appropriate state, if you are not sure what you are doing. |
void |
setDeviceModify(long numBytes)
If memory on GPU has been allocated from elsewhere, this method
updates the internal bookkeeping
|
void |
setSparseMatrixCudaPointer(JCudaObject.CSRPointer sparseMatrixPtr)
Convenience method to directly set the sparse matrix on GPU
Make sure to call
setDeviceModify(long) after this to set appropriate state, if you are not sure what you are doing. |
void |
sparseToColumnMajorDense()
More efficient method to convert sparse to dense but returns dense in column major format
|
void |
sparseToDense()
Convert sparse to dense (Performs transpose, use sparseToColumnMajorDense if the kernel can deal with column major format)
|
void |
sparseToDense(String instructionName)
Convert sparse to dense (Performs transpose, use sparseToColumnMajorDense if the kernel can deal with column major format)
Also records per instruction invokation of sparseToDense.
|
static int |
toIntExact(long l) |
static jcuda.Pointer |
transpose(jcuda.Pointer densePtr,
int m,
int n,
int lda,
int ldc)
Transposes a dense matrix on the GPU by calling the cublasDgeam operation
|
clearData, clearData, evict, evict, getAvailableMemory, isInSparseFormat
public jcuda.Pointer jcudaDenseMatrixPtr
public JCudaObject.CSRPointer jcudaSparseMatrixPtr
public long numBytes
public int[] getTensorShape()
public jcuda.jcudnn.cudnnTensorDescriptor getTensorDescriptor()
public jcuda.jcudnn.cudnnTensorDescriptor allocateTensorDescriptor(int N, int C, int H, int W)
N
- number of imagesC
- number of channelsH
- heightW
- widthpublic boolean isAllocated()
isAllocated
in class GPUObject
public void allocateSparseAndEmpty() throws DMLRuntimeException
JCudaObject
This is the result of operations that are both non zero matrices.DMLRuntimeException
- if DMLRuntimeException occurspublic void allocateAndFillDense(double v) throws DMLRuntimeException
v
- value to fill up the dense matrixDMLRuntimeException
- if DMLRuntimeException occurspublic boolean isSparseAndEmpty()
JCudaObject
is sparse and empty
Being allocated is a prerequisite to being sparse and empty.public boolean acquireDeviceRead() throws DMLRuntimeException
GPUObject
acquireDeviceRead
in class GPUObject
DMLRuntimeException
- ?public boolean acquireDeviceModifyDense() throws DMLRuntimeException
GPUObject
acquireDeviceModifyDense
in class GPUObject
DMLRuntimeException
- if DMLRuntimeException occurspublic boolean acquireDeviceModifySparse() throws DMLRuntimeException
GPUObject
acquireDeviceModifySparse
in class GPUObject
DMLRuntimeException
- if DMLRuntimeException occurspublic boolean acquireHostRead() throws org.apache.sysml.runtime.controlprogram.caching.CacheException
GPUObject
acquireHostRead
in class GPUObject
org.apache.sysml.runtime.controlprogram.caching.CacheException
- ?public void releaseInput() throws org.apache.sysml.runtime.controlprogram.caching.CacheException
releaseInput
in class GPUObject
org.apache.sysml.runtime.controlprogram.caching.CacheException
- if data is not allocatedpublic void releaseOutput() throws org.apache.sysml.runtime.controlprogram.caching.CacheException
releaseOutput
in class GPUObject
org.apache.sysml.runtime.controlprogram.caching.CacheException
- if data is not allocatedpublic void setDeviceModify(long numBytes)
GPUObject
setDeviceModify
in class GPUObject
numBytes
- number of bytespublic static int toIntExact(long l) throws DMLRuntimeException
DMLRuntimeException
protected void copyFromDeviceToHost() throws DMLRuntimeException
GPUObject
MatrixBlock
instance is allocated, data from the GPU is copied in,
the current one in Host memory is deallocated by calling MatrixObject's acquireHostModify(MatrixBlock) (??? does not exist)
and overwritten with the newly allocated instance.
TODO : re-examine this to avoid spurious allocations of memory for optimizationsDMLRuntimeException
- if DMLRuntimeException occursprotected long getSizeOnDevice() throws DMLRuntimeException
DMLRuntimeException
public JCudaObject.CSRPointer getSparseMatrixCudaPointer()
public void setSparseMatrixCudaPointer(JCudaObject.CSRPointer sparseMatrixPtr)
setDeviceModify(long)
after this to set appropriate state, if you are not sure what you are doing.
Needed for operations like JCusparse.cusparseDcsrgemm(cusparseHandle, int, int, int, int, int, cusparseMatDescr, int, Pointer, Pointer, Pointer, cusparseMatDescr, int, Pointer, Pointer, Pointer, cusparseMatDescr, Pointer, Pointer, Pointer)
sparseMatrixPtr
- CSR (compressed sparse row) pointerpublic void setDenseMatrixCudaPointer(jcuda.Pointer densePtr)
setDeviceModify(long)
after this to set appropriate state, if you are not sure what you are doing.densePtr
- dense pointerpublic void denseToSparse() throws DMLRuntimeException
DMLRuntimeException
- if DMLRuntimeException occurspublic static jcuda.Pointer transpose(jcuda.Pointer densePtr, int m, int n, int lda, int ldc) throws DMLRuntimeException
densePtr
- Pointer to dense matrix on the GPUm
- rows in ouput matrixn
- columns in output matrixlda
- rows in input matrixldc
- columns in output matrixDMLRuntimeException
- if operation failedpublic void sparseToDense() throws DMLRuntimeException
DMLRuntimeException
- if DMLRuntimeException occurspublic void sparseToDense(String instructionName) throws DMLRuntimeException
instructionName
- Name of the instruction for which statistics are recorded in GPUStatistics
DMLRuntimeException
- ?public void sparseToColumnMajorDense() throws DMLRuntimeException
DMLRuntimeException
- if DMLRuntimeException occurspublic static JCudaObject.CSRPointer columnMajorDenseToRowMajorSparse(jcuda.jcusparse.cusparseHandle cusparseHandle, int rows, int cols, jcuda.Pointer densePtr) throws DMLRuntimeException
cusparseHandle
- handle to cusparse libraryrows
- number of rowscols
- number of columnsdensePtr
- [in] dense matrix pointer on the GPU in row majorDMLRuntimeException
- if DMLRuntimeException occurspublic static jcuda.Pointer allocate(long size) throws DMLRuntimeException
allocate(String, long, int)
, defaults statsCount to 1.size
- size of data (in bytes) to allocateDMLRuntimeException
- if DMLRuntimeException occurspublic static jcuda.Pointer allocate(String instructionName, long size) throws DMLRuntimeException
allocate(String, long, int)
, defaults statsCount to 1.instructionName
- name of instruction for which to record per instruction performance statistics, null if don't want to recordsize
- size of data (in bytes) to allocateDMLRuntimeException
- if DMLRuntimeException occurspublic static jcuda.Pointer allocate(String instructionName, long size, int statsCount) throws DMLRuntimeException
instructionName
- name of instruction for which to record per instruction performance statistics, null if don't want to recordsize
- Size of data (in bytes) to allocatestatsCount
- amount to increment the cudaAllocCount byDMLRuntimeException
- if DMLRuntimeException occurspublic static void cudaFreeHelper(jcuda.Pointer toFree)
toFree
- Pointer
instance to be freedpublic static void cudaFreeHelper(jcuda.Pointer toFree, boolean eager)
toFree
- Pointer
instance to be freedeager
- true if to be done eagerlypublic static void cudaFreeHelper(String instructionName, jcuda.Pointer toFree)
instructionName
- name of the instruction for which to record per instruction free time, null if do not want to recordtoFree
- Pointer
instance to be freedpublic static void cudaFreeHelper(String instructionName, jcuda.Pointer toFree, boolean eager)
instructionName
- name of the instruction for which to record per instruction free time, null if do not want to recordtoFree
- Pointer
instance to be freedeager
- true if to be done eagerlypublic static String debugString(jcuda.Pointer A, long rows, long cols) throws DMLRuntimeException
A
- Pointer to memory on device (GPU), assumed to point to a double arrayrows
- rows in matrix Acols
- columns in matrix ADMLRuntimeException
- if DMLRuntimeException occursCopyright © 2017 The Apache Software Foundation. All rights reserved.