NifTK
16.4.1 - 0798f20
CMIC's Translational Medical Imaging Platform
|
Public Member Functions | |
ScopedCUDADevice | ActivateDevice (int dev) |
cudaStream_t | GetStream (const std::string &name) |
ReadAccessor | RequestReadAccess (const LightweightCUDAImage &lwci) |
WriteAccessor | RequestOutputImage (unsigned int width, unsigned int height, int FIXME_pixeltype) |
LightweightCUDAImage | Finalise (WriteAccessor &writeAccessor, cudaStream_t stream) |
LightweightCUDAImage | FinaliseAndAutorelease (WriteAccessor &writeAccessor, ReadAccessor &readAccessor, cudaStream_t stream) |
void | Autorelease (ReadAccessor &readAccessor, cudaStream_t stream) |
void | Autorelease (WriteAccessor &writeAccessor, cudaStream_t stream) |
Static Public Member Functions | |
static CUDAManager * | GetInstance () |
Protected Member Functions | |
CUDAManager () | |
virtual | ~CUDAManager () |
void | AllRefsDropped (LightweightCUDAImage &lwci) |
Friends | |
class | LightweightCUDAImage |
struct | impldetail::ModuleCleanup |
Singleton that owns all CUDA resources. It manages images in a copy-on-write like fashion: you cannot write into an existing CUDA-image, you can only read from these and write data into a newly allocated one.
To get access to an image living on the card, do the usual DataNode::GetData(), and a cast to CUDAImage. Then call CUDAImage::GetLightweightCUDAImage() to retrieve a handle to the actual bits in CUDA-memory. Side note: even though LightweightCUDAImage has members you should consider it opaque.
This LightweightCUDAImage instance you can use with RequestReadAccess() to obtain a device pointer that you can read from in your kernel. RequestReadAccess() will increment a reference count for that image so that CUDAManager will not recycle it too early.
Then call RequestOutputImage() to get a device pointer to where you can write your kernel's output. From an API point of view, RequestOutputImage() will always give you a new memory block so that you never overwrite an existing image.
Call GetStream() with your favourite name, or create your own stream, for synchronising and coarse-grain parallelising CUDA tasks.
Run your kernel on your stream. But do not synchronise on its completion!
When all your work has been submitted to the driver, call FinaliseAndAutorelease() to turn the output device pointer into a proper LightweightCUDAImage that you can stick onto a DataNode. This function will also release your read-request on the input image at the right time so that it can be eventually returned to the memory pool. In addition, Finalise*() functions will queue a "ready" event that you can use on your stream to GPU-synchronise on completion of a previous processing step.
CUDAManager is thread-safe: all public methods can be called from any thread at any time.
|
protected |
|
protectedvirtual |
ScopedCUDADevice niftk::CUDAManager::ActivateDevice | ( | int | dev | ) |
|
protected |
Used by LightweightCUDAImage to notify us that all references to it have been dropped, and that it can be placed back onto m_AvailableImagePool for later re-use.
void niftk::CUDAManager::Autorelease | ( | ReadAccessor & | readAccessor, |
cudaStream_t | stream | ||
) |
Releases the read-request of an image once processing on stream has finished. This method does not block, it will return immediately. Make sure you call this method after Finalise(), or use FinaliseAndAutorelease().
void niftk::CUDAManager::Autorelease | ( | WriteAccessor & | writeAccessor, |
cudaStream_t | stream | ||
) |
LightweightCUDAImage niftk::CUDAManager::Finalise | ( | WriteAccessor & | writeAccessor, |
cudaStream_t | stream | ||
) |
LightweightCUDAImage niftk::CUDAManager::FinaliseAndAutorelease | ( | WriteAccessor & | writeAccessor, |
ReadAccessor & | readAccessor, | ||
cudaStream_t | stream | ||
) |
Combines Finalise() and Autorelease() into a single call.
|
static |
std::runtime_error | if CUDA is not available on the system. |
cudaStream_t niftk::CUDAManager::GetStream | ( | const std::string & | name | ) |
WriteAccessor niftk::CUDAManager::RequestOutputImage | ( | unsigned int | width, |
unsigned int | height, | ||
int | FIXME_pixeltype | ||
) |
ReadAccessor niftk::CUDAManager::RequestReadAccess | ( | const LightweightCUDAImage & | lwci | ) |
std::runtime_error | if lwci is not valid. |
|
friend |
|
friend |