org.apache.sysml.runtime.instructions.gpu.context

Class GPUContext



  • public class GPUContext
    extends Object
    Represents a context per GPU accessible through the same JVM. Each context holds cublas, cusparse, cudnn... handles which are separate for each GPU.
    • Field Detail

      • LOG

        protected static final org.apache.commons.logging.Log LOG
      • GPU_MEMORY_UTILIZATION_FACTOR

        public double GPU_MEMORY_UTILIZATION_FACTOR
    • Method Detail

      • cudaGetDevice

        public static int cudaGetDevice()
        Returns which device is currently being used.
        Returns:
        the current device for the calling host thread
      • getDeviceNum

        public int getDeviceNum()
        Returns which device is assigned to this GPUContext instance.
        Returns:
        active device assigned to this GPUContext instance
      • initializeThread

        public void initializeThread()
                              throws DMLRuntimeException
        Sets the device for the calling thread. This method must be called after ExecutionContext.getGPUContext(int) If in a multi-threaded environment like parfor, this method must be called when in the appropriate thread.
        Throws:
        DMLRuntimeException - if DMLRuntimeException occurs
      • allocate

        public jcuda.Pointer allocate(String instructionName,
                                      long size)
                               throws DMLRuntimeException
        Convenience method for allocate(String, long, int), defaults statsCount to 1.
        Parameters:
        instructionName - name of instruction for which to record per instruction performance statistics, null if don't want to record
        size - size of data (in bytes) to allocate
        Returns:
        jcuda pointer
        Throws:
        DMLRuntimeException - if DMLRuntimeException occurs
      • allocate

        public jcuda.Pointer allocate(String instructionName,
                                      long size,
                                      int statsCount)
                               throws DMLRuntimeException
        Allocates temporary space on the device. Does not update bookkeeping. The caller is responsible for freeing up after usage.
        Parameters:
        instructionName - name of instruction for which to record per instruction performance statistics, null if don't want to record
        size - Size of data (in bytes) to allocate
        statsCount - amount to increment the cudaAllocCount by
        Returns:
        jcuda Pointer
        Throws:
        DMLRuntimeException - if DMLRuntimeException occurs
      • cudaFreeHelper

        public void cudaFreeHelper(jcuda.Pointer toFree)
        Does lazy cudaFree calls.
        Parameters:
        toFree - Pointer instance to be freed
      • cudaFreeHelper

        public void cudaFreeHelper(jcuda.Pointer toFree,
                                   boolean eager)
        Does lazy/eager cudaFree calls.
        Parameters:
        toFree - Pointer instance to be freed
        eager - true if to be done eagerly
      • cudaFreeHelper

        public void cudaFreeHelper(String instructionName,
                                   jcuda.Pointer toFree)
        Does lazy cudaFree calls.
        Parameters:
        instructionName - name of the instruction for which to record per instruction free time, null if do not want to record
        toFree - Pointer instance to be freed
      • cudaFreeHelper

        public void cudaFreeHelper(String instructionName,
                                   jcuda.Pointer toFree,
                                   boolean eager)
        Does cudaFree calls, lazily.
        Parameters:
        instructionName - name of the instruction for which to record per instruction free time, null if do not want to record
        toFree - Pointer instance to be freed
        eager - true if to be done eagerly
      • evict

        protected void evict(String instructionName,
                             long neededSize)
                      throws DMLRuntimeException
        Memory on the GPU is tried to be freed up until either a chunk of needed size is freed up or it fails. First the set of reusable blocks is freed up. If that isn't enough, the set of allocated matrix blocks with zero locks on them is freed up. The process cycles through the sorted list of allocated GPUObject instances. Sorting is based on number of (read) locks that have been obtained on it (reverse order). It repeatedly frees up blocks on which there are zero locks until the required size has been freed up. // TODO: update it with hybrid policy
        Parameters:
        instructionName - name of the instruction for which performance measurements are made
        neededSize - desired size to be freed up on the GPU
        Throws:
        DMLRuntimeException - If no reusable memory blocks to free up or if not enough matrix blocks with zero locks on them.
      • isBlockRecorded

        public boolean isBlockRecorded(GPUObject o)
        Whether the GPU associated with this GPUContext has recorded the usage of a certain block.
        Parameters:
        o - the block
        Returns:
        true if present, false otherwise
      • getAvailableMemory

        public long getAvailableMemory()
        Gets the available memory on GPU that SystemML can use.
        Returns:
        the available memory in bytes
      • ensureComputeCapability

        public void ensureComputeCapability()
                                     throws DMLRuntimeException
        Makes sure that GPU that SystemML is trying to use has the minimum compute capability needed.
        Throws:
        DMLRuntimeException - if the compute capability is less than what is required
      • createGPUObject

        public GPUObject createGPUObject(org.apache.sysml.runtime.controlprogram.caching.MatrixObject mo)
        Instantiates a new GPUObject initialized with the given MatrixObject.
        Parameters:
        mo - a MatrixObject that represents a matrix
        Returns:
        a new GPUObject instance
      • getGPUProperties

        public jcuda.runtime.cudaDeviceProp getGPUProperties()
                                                      throws DMLRuntimeException
        Gets the device properties for the active GPU (set with cudaSetDevice()).
        Returns:
        the device properties
        Throws:
        DMLRuntimeException - ?
      • getMaxThreadsPerBlock

        public int getMaxThreadsPerBlock()
                                  throws DMLRuntimeException
        Gets the maximum number of threads per block for "active" GPU.
        Returns:
        the maximum number of threads per block
        Throws:
        DMLRuntimeException - ?
      • getMaxBlocks

        public int getMaxBlocks()
                         throws DMLRuntimeException
        Gets the maximum number of blocks supported by the active cuda device.
        Returns:
        the maximum number of blocks supported
        Throws:
        DMLRuntimeException - ?
      • getMaxSharedMemory

        public long getMaxSharedMemory()
                                throws DMLRuntimeException
        Gets the shared memory per block supported by the active cuda device.
        Returns:
        the shared memory per block
        Throws:
        DMLRuntimeException - ?
      • getCudnnHandle

        public jcuda.jcudnn.cudnnHandle getCudnnHandle()
        Returns the cudnnHandle for Deep Neural Network operations on the GPU.
        Returns:
        cudnnHandle for current thread
      • getCublasHandle

        public jcuda.jcublas.cublasHandle getCublasHandle()
        Returns cublasHandle for BLAS operations on the GPU.
        Returns:
        cublasHandle for current thread
      • getCusparseHandle

        public jcuda.jcusparse.cusparseHandle getCusparseHandle()
        Returns cusparseHandle for certain sparse BLAS operations on the GPU.
        Returns:
        cusparseHandle for current thread
      • getCusolverDnHandle

        public jcuda.jcusolver.cusolverDnHandle getCusolverDnHandle()
        Returns cusolverDnHandle for invoking solve() function on dense matrices on the GPU.
        Returns:
        cusolverDnHandle for current thread
      • getCusolverSpHandle

        public jcuda.jcusolver.cusolverSpHandle getCusolverSpHandle()
        Returns cusolverSpHandle for invoking solve() function on sparse matrices on the GPU.
        Returns:
        cusolverSpHandle for current thread
      • getKernels

        public JCudaKernels getKernels()
        Returns utility class used to launch custom CUDA kernel, specific to the active GPU for this GPUContext.
        Returns:
        JCudaKernels for current thread
      • clearMemory

        public void clearMemory()
                         throws DMLRuntimeException
        Clears all memory used by this GPUContext. Be careful to ensure that no memory is currently being used in the temporary memory before invoking this. If memory is being used between MLContext invocations, they are pointed to by a GPUObject instance which would be part of the MatrixObject. The cleanup of that MatrixObject instance will cause the memory associated with that block on the GPU to be freed up.
        Throws:
        DMLRuntimeException - ?
      • clearTemporaryMemory

        public void clearTemporaryMemory()
        Clears up the memory used to optimize cudaMalloc/cudaFree calls.

Copyright © 2017 The Apache Software Foundation. All rights reserved.