Note that there are some explanatory texts on larger screens.

plurals
  1. POHLSL buffer stride and threading - what is happening here?
    primarykey
    data
    text
    <p>I'm really new to DirectCompute technologies, and have been attempting to learn from the documentation on the msdn website, which is.. dense, to say the least.</p> <p>I'd like to make a basic hlsl file that takes in a 4x4 matrix and a 4xN matrix and returns the multiplied result. But after spending some time playing with the code, I've found some weird stuff I don't understand - mainly with how the threads I pass in process the buffers and output data.</p> <p>With all of these examples, I pass in two 16 float buffers and get out a 16 float buffer and then Dispatch with a 4x1x1 grouping - I can show you code, but I honestly dont yet know what would help you help me. Let me know if there's a section of my C++ code you want to see.</p> <p>with the following code:</p> <pre><code>StructuredBuffer&lt;float4x4&gt; base_matrix : register(t0); // byteWidth = 64 StructuredBuffer&lt;float4&gt; extended_matrix : register(t1); // byteWidth = 64 RWStructuredBuffer&lt;float4&gt; BufferOut : register(u0); // byteWidth = 64, zeroed out before reading from the GPU [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x].x = 1; } </code></pre> <p>I get the following values out:</p> <pre><code>1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 </code></pre> <p>This makes sense to me - the buffer is parsed as 4 threads, each executing 1 float4 grouping.</p> <p>with the following code:</p> <pre><code>StructuredBuffer&lt;float4x4&gt; base_matrix : register(t0); // byteWidth = 64 StructuredBuffer&lt;float4&gt; extended_matrix : register(t1); // byteWidth = 64 RWStructuredBuffer&lt;float4&gt; BufferOut : register(u0); // byteWidth = 64, zeroed out before reading from the GPU [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x].x = 1; BufferOut[DTid.x].y = 2; BufferOut[DTid.x].z = 3; BufferOut[DTid.x].w = 4; } </code></pre> <p>I get the following values out:</p> <pre><code>1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 </code></pre> <p>and with the actual code I want to run:</p> <pre><code>StructuredBuffer&lt;float4x4&gt; base_matrix : register(t0); StructuredBuffer&lt;float4&gt; extended_matrix : register(t1); RWStructuredBuffer&lt;float4&gt; BufferOut : register(u0); [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x] = mul(base_matrix[0],extended_matrix[DTid.x]) } </code></pre> <p>I get the following values out:</p> <pre><code>0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 </code></pre> <p>I can tell I'm missing a critical thing here, but for the life of me I cant find the appropriate documentation telling me how these work. Could someone help me understand whats going on in this code?</p> <p>Thanks for your time,</p> <p>Zach</p> <p>As another note, this code was cribbed together using the Microsoft DirectX SDK (June 2010)\Samples\C++\Direct3D11\BasicCompute11 Sample available. If I'm doing something terribly wrong, feel free to let me know. I'm REALLY new at HLSL.</p> <p>Edit: My buffer creation code.</p> <pre><code>CreateStructuredBuffer( g_pDevice, sizeof(float)*16, 1, g_matrix, &amp;g_pBuf0 ); CreateStructuredBuffer( g_pDevice, sizeof(float)*4, NUM_ELEMENTS, g_extended_matrix, &amp;g_pBuf1 ); CreateStructuredBuffer( g_pDevice, sizeof(float)*4, NUM_ELEMENTS, NULL, &amp;g_pBufResult ); //-------------------------------------------------------------------------------------- // Create Structured Buffer //-------------------------------------------------------------------------------------- HRESULT CreateStructuredBuffer( ID3D11Device* pDevice, UINT uElementSize, UINT uCount, VOID* pInitData, ID3D11Buffer** ppBufOut ) { *ppBufOut = NULL; D3D11_BUFFER_DESC desc; ZeroMemory( &amp;desc, sizeof(desc) ); desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE; desc.ByteWidth = uElementSize * uCount; desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED; desc.StructureByteStride = uElementSize; if ( pInitData ) { D3D11_SUBRESOURCE_DATA InitData; InitData.pSysMem = pInitData; return pDevice-&gt;CreateBuffer( &amp;desc, &amp;InitData, ppBufOut ); } else return pDevice-&gt;CreateBuffer( &amp;desc, NULL, ppBufOut ); } </code></pre> <p>Trying .1,.2,.3,.4 ...</p> <pre><code>StructuredBuffer&lt;float4x4&gt; base_matrix : register(t0); StructuredBuffer&lt;float4&gt; extended_matrix : register(t1); StructuredBuffer&lt;uint&gt; loop_multiplier : register(t2); RWStructuredBuffer&lt;float4&gt; BufferOut : register(u0); [numthreads(1, 1, 1)] void CSMain( uint3 DTid : SV_DispatchThreadID ) { BufferOut[DTid.x].x = .1; BufferOut[DTid.x].y = .2; BufferOut[DTid.x].z = .3; BufferOut[DTid.x].w = .4; } </code></pre> <p>got this out:</p> <pre><code>0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 0.100 </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload