StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POCUDA: Wrapping device memory allocation in C++
text
Body
copied!<p>I'm starting to use CUDA at the moment and have to admit that I'm a bit disappointed with the C API. I understand the reasons for choosing C but had the language been based on C++ instead, several aspects would have been a lot simpler, e.g. device memory allocation (via <code>cudaMalloc</code>).</p> <p>My plan was to do this myself, using overloaded <code>operator new</code> with placement <code>new</code> and RAII (two alternatives). I'm wondering if there are any caveats that I haven't noticed so far. The code <em>seems</em> to work but I'm still wondering about potential memory leaks.</p> <p>The usage of the <strong>RAII</strong> code would be as follows:</p> <pre><code>CudaArray<float> device_data(SIZE); // Use `device_data` as if it were a raw pointer. </code></pre> <p>Perhaps a class is overkill in this context (especially since you'd still have to use <code>cudaMemcpy</code>, the class only encapsulating RAII) so the other approach would be <strong>placement <code>new</code></strong>:</p> <pre><code>float* device_data = new (cudaDevice) float[SIZE]; // Use `device_data` … operator delete [](device_data, cudaDevice); </code></pre> <p>Here, <code>cudaDevice</code> merely acts as a tag to trigger the overload. However, since in normal placement <code>new</code> this would indicate the placement, I find the syntax oddly consistent and perhaps even preferable to using a class.</p> <p>I'd appreciate criticism of every kind. Does somebody perhaps know if something in this direction is planned for the next version of CUDA (which, as I've heard, will improve its C++ support, whatever they mean by that).</p> <p>So, my question is actually threefold:</p> <ol> <li>Is my placement <code>new</code> overload semantically correct? Does it leak memory?</li> <li>Does anybody have information about future CUDA developments that go in this general direction (let's face it: C interfaces in C++ s*ck)?</li> <li>How can I take this further in a consistent manner (there are other APIs to consider, e.g. there's not only device memory but also a constant memory store and texture memory)? </li> </ol> <hr> <pre><code>// Singleton tag for CUDA device memory placement. struct CudaDevice { static CudaDevice const& get() { return instance; } private: static CudaDevice const instance; CudaDevice() { } CudaDevice(CudaDevice const&); CudaDevice& operator =(CudaDevice const&); } const& cudaDevice = CudaDevice::get(); CudaDevice const CudaDevice::instance; inline void* operator new [](std::size_t nbytes, CudaDevice const&) { void* ret; cudaMalloc(&ret, nbytes); return ret; } inline void operator delete [](void* p, CudaDevice const&) throw() { cudaFree(p); } template <typename T> class CudaArray { public: explicit CudaArray(std::size_t size) : size(size), data(new (cudaDevice) T[size]) { } operator T* () { return data; } ~CudaArray() { operator delete [](data, cudaDevice); } private: std::size_t const size; T* const data; CudaArray(CudaArray const&); CudaArray& operator =(CudaArray const&); }; </code></pre> <p>About the singleton employed here: Yes, I'm aware of its drawbacks. However, these aren't relevant in this context. All I needed here was a small type tag that wasn't copyable. Everything else (i.e. multithreading considerations, time of initialization) don't apply.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload