StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow can I improve the performance of my custom OpenGL ES 2.0 depth texture generation?
text
Body
copied!<p>I have an open source iOS application that uses custom OpenGL ES 2.0 shaders to display 3-D representations of molecular structures. It does this by using procedurally generated sphere and cylinder impostors drawn over rectangles, instead of these same shapes built using lots of vertices. The downside to this approach is that the depth values for each fragment of these impostor objects needs to be calculated in a fragment shader, to be used when objects overlap.</p> <p>Unfortunately, OpenGL ES 2.0 <a href="https://stackoverflow.com/questions/4534467/writing-texture-data-onto-depth-buffer/4596314#4596314">does not let you write to gl_FragDepth</a>, so I've needed to output these values to a custom depth texture. I do a pass over my scene using a framebuffer object (FBO), only rendering out a color that corresponds to a depth value, with the results being stored into a texture. This texture is then loaded into the second half of my rendering process, where the actual screen image is generated. If a fragment at that stage is at the depth level stored in the depth texture for that point on the screen, it is displayed. If not, it is tossed. More about the process, including diagrams, can be found in my post <a href="http://www.sunsetlakesoftware.com/2011/05/08/enhancing-molecules-using-opengl-es-20" rel="nofollow noreferrer">here</a>.</p> <p>The generation of this depth texture is a bottleneck in my rendering process and I'm looking for a way to make it faster. It seems slower than it should be, but I can't figure out why. In order to achieve the proper generation of this depth texture, <code>GL_DEPTH_TEST</code> is disabled, <code>GL_BLEND</code> is enabled with <code>glBlendFunc(GL_ONE, GL_ONE)</code>, and <code>glBlendEquation()</code> is set to <code>GL_MIN_EXT</code>. I know that a scene output in this manner isn't the fastest on a tile-based deferred renderer like the PowerVR series in iOS devices, but I can't think of a better way to do this.</p> <p>My depth fragment shader for spheres (the most common display element) looks to be at the heart of this bottleneck (Renderer Utilization in Instruments is pegged at 99%, indicating that I'm limited by fragment processing). It currently looks like the following:</p> <pre><code>precision mediump float; varying mediump vec2 impostorSpaceCoordinate; varying mediump float normalizedDepth; varying mediump float adjustedSphereRadius; const vec3 stepValues = vec3(2.0, 1.0, 0.0); const float scaleDownFactor = 1.0 / 255.0; void main() { float distanceFromCenter = length(impostorSpaceCoordinate); if (distanceFromCenter > 1.0) { gl_FragColor = vec4(1.0); } else { float calculatedDepth = sqrt(1.0 - distanceFromCenter * distanceFromCenter); mediump float currentDepthValue = normalizedDepth - adjustedSphereRadius * calculatedDepth; // Inlined color encoding for the depth values float ceiledValue = ceil(currentDepthValue * 765.0); vec3 intDepthValue = (vec3(ceiledValue) * scaleDownFactor) - stepValues; gl_FragColor = vec4(intDepthValue, 1.0); } } </code></pre> <p>On an iPad 1, this takes 35 - 68 ms to render a frame of a DNA spacefilling model using a passthrough shader for display (18 to 35 ms on iPhone 4). According to the PowerVR PVRUniSCo compiler (part of <a href="http://www.imgtec.com/powervr/insider/sdkdownloads/index.asp#GLES2" rel="nofollow noreferrer">their SDK</a>), this shader uses 11 GPU cycles at best, 16 cycles at worst. I'm aware that you're advised not to use branching in a shader, but in this case that led to better performance than otherwise.</p> <p>When I simplify it to </p> <pre><code>precision mediump float; varying mediump vec2 impostorSpaceCoordinate; varying mediump float normalizedDepth; varying mediump float adjustedSphereRadius; void main() { gl_FragColor = vec4(adjustedSphereRadius * normalizedDepth * (impostorSpaceCoordinate + 1.0) / 2.0, normalizedDepth, 1.0); } </code></pre> <p>it takes 18 - 35 ms on iPad 1, but only 1.7 - 2.4 ms on iPhone 4. The estimated GPU cycle count for this shader is 8 cycles. The change in render time based on cycle count doesn't seem linear.</p> <p>Finally, if I just output a constant color:</p> <pre><code>precision mediump float; void main() { gl_FragColor = vec4(0.5, 0.5, 0.5, 1.0); } </code></pre> <p>the rendering time drops to 1.1 - 2.3 ms on iPad 1 (1.3 ms on iPhone 4).</p> <p>The nonlinear scaling in rendering time and sudden change between iPad and iPhone 4 for the second shader makes me think that there's something I'm missing here. A full source project containing these three shader variants (look in the SphereDepth.fsh file and comment out the appropriate sections) and a test model can be downloaded from <a href="http://www.sunsetlakesoftware.com/sites/default/files/Molecules-DepthShaderProfiling.zip" rel="nofollow noreferrer">here</a>, if you wish to try this out yourself.</p> <p>If you've read this far, my question is: based on this profiling information, how can I improve the rendering performance of my custom depth shader on iOS devices?</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload