Note that there are some explanatory texts on larger screens.

plurals
  1. POUnconventional and dodgy Android crash in during JNI/OpenGL ES loading code
    text
    copied!<h2>Bounty</h2> <p>Since this is an important problem to me I've stuck a bounty on. I'm not looking for the exact answer -- whatever answer leads me to fix this problem gets the bounty. Please make sure you've seen the edit just below.</p> <p>Edit: I've since managed to catch the crash in Gdb just as it dies (via "adb shell setprop debug.db.uid 32767") and noticed this is the exact same problem as is mentioned on <a href="http://groups.google.com/group/android-ndk/browse_thread/thread/3742f5a15f39df28" rel="nofollow">this post</a> on Google Groups. The backtrace shown is the same (except for precise addresses) as my crashing thread. I'll admit, I'm no debugging tool wizard, so if you've any ideas of what I should be looking for please let me know.</p> <h2>The quick and dirty rundown</h2> <p>I've whittled away most of my reasonably large application's code so that the app does the following: Loads in a bunch of textures via JNI'd wrappers (from C++ --> Java) so that the Java libraries handle the decoding for me, makes OpenGL textures out of them, and clears the screen to a rather pretty but mocking dark blue color. It's dying in libc, but only one in every ten times.</p> <p>To make matters worse, it doesn't even look like it's dying related to any of the code I've written -- it seems to happen in a delayed fashion, but it doesn't seem to be related to something as convenient to blame as the garbage collector. There is no specific point in my own code that the crash occurs at -- it seems to shift around on a per-run basis.</p> <h2>The longer story</h2> <p>I'm ending up with a standard crash dump with a stack that tells me just about nothing because it's got two entries, one to libc and one to what looks like an invalid or null stack frame. The resolved symbol in libc is pthread_mutex_unlock. I no longer even use this function myself since I've stripped out the need for multi-threading. (The native code is called in a surface view and just renders.)</p> <p>pthread_mutex_unlock is resulting in a segmentation fault, usually at address 0 but sometimes a small value (less than 0x200) instead of 0. The default (and most common) mutex in Bionic only has one pointer it can segfault on, and that's the pointer to the pthread_mutex_t structure itself. However, a more complex mutex (there's several options) may use additional pointers. So, chances are libc is fine and libdvm is having the issue (assuming I can trust my stack trace even that far).</p> <p>Let me note this problem only seems to be reproducible if I do one of these two things: disable loading in the data portion of images (but still reading format/dimension information) and leaving the buffer which I use for loading textures into OpenGL uninitialized, or disabling the creation of the OpenGL texture via disabling only the final glTexImage2D call.</p> <p>Note that the aforementioned buffer for loading textures into OpenGL is only created once and destroyed once. I've tried enlarging it and determined that I'm not troubled by a buffer overrun issue specific to that buffer.</p> <p>The main culprits I can think of are:</p> <ul> <li>I'm not using JNI right and it's doing something nasty to the stack.</li> <li>I have an off-by-one error someplace that's corrupting a stack frame.</li> <li>I'm passing OpenGL ES something bad and it's doing something equally bad(tm).</li> <li>My custom-rolled memory allocator isn't functioning properly.</li> </ul> <p>I've been combing my code for such culprits (and more!) for days. I'm hesitant to use a debugger because this crash seems to be timing-sensitive. However, I can still get the crash with my own native code entirely unoptimized with debug options enabled. (gdb itself runs at a crawl and so does the app when it's connected)</p> <h2>Things I've done</h2> <ul> <li>Used CheckJNI.</li> <li>Stripped down as much as the code as I possibly can until it stops crashing.</li> <li>Written a signal handler and coded a small logging system to dump out the last things done before the signal was thrown.</li> <li>Tried (and failed) to exacerbate the problem.</li> <li>Padded native heap arrays on both ends with canaries. They never changed.</li> <li>Audited 100% of the code in the code path. (I'm just not seeing the issue.)</li> <li>Thought the problem magically disappeared when I fixed a minor error, ran the code fifty times to make sure this was so, and then crashed the next day the first time I ran. (Ooh, I've never been so angry at a bug before!)</li> </ul> <p>Here's a snippet of the usual native crash info from LogCat:</p> <pre><code>I/DEBUG ( 5818): signal 11 (SIGSEGV), fault addr 00000000 I/DEBUG ( 5818): r0 0000006e r1 00000080 r2 fffffc5e r3 100ffe58 I/DEBUG ( 5818): r4 00000000 r5 00000000 r6 00000000 r7 00000000 I/DEBUG ( 5818): r8 00000000 r9 8054f999 10 10000000 fp 0013e768 I/DEBUG ( 5818): ip 3b9aca00 sp 100ffe58 lr afd10640 pc 00000000 cpsr 60000010 I/DEBUG ( 5818): d0 643a64696f72646e d1 6472656767756265 I/DEBUG ( 5818): d2 8083297880832965 d3 8083298880832973 I/DEBUG ( 5818): d4 8083291080832908 d5 8083292080832918 I/DEBUG ( 5818): d6 8083293080832928 d7 8083294880832938 I/DEBUG ( 5818): d8 0000000000000000 d9 0000000000000000 I/DEBUG ( 5818): d10 0000000000000000 d11 0000000000000000 I/DEBUG ( 5818): d12 0000000000000000 d13 0000000000000000 I/DEBUG ( 5818): d14 0000000000000000 d15 0000000000000000 I/DEBUG ( 5818): d16 0000000000000000 d17 3fe999999999999a I/DEBUG ( 5818): d18 42eccefa43de3400 d19 3fe00000000000b4 I/DEBUG ( 5818): d20 4008000000000000 d21 3fd99a27ad32ddf5 I/DEBUG ( 5818): d22 3fd24998d6307188 d23 3fcc7288e957b53b I/DEBUG ( 5818): d24 3fc74721cad6b0ed d25 3fc39a09d078c69f I/DEBUG ( 5818): d26 0000000000000000 d27 0000000000000000 I/DEBUG ( 5818): d28 0000000000000000 d29 0000000000000000 I/DEBUG ( 5818): d30 0000000000000000 d31 0000000000000000 I/DEBUG ( 5818): scr 80000012 I/DEBUG ( 5818): I/DEBUG ( 5818): #00 pc 00000000 I/DEBUG ( 5818): #01 pc 0001063c /system/lib/libc.so I/DEBUG ( 5818): I/DEBUG ( 5818): code around pc: I/DEBUG ( 5818): I/DEBUG ( 5818): code around lr: I/DEBUG ( 5818): afd10620 e1a01008 e1a02007 e1a03006 e1a00005 I/DEBUG ( 5818): afd10630 ebfff95d e1a05000 e1a00004 ebffff46 I/DEBUG ( 5818): afd10640 e375006e 03a0006e 13a00000 e8bd81f0 I/DEBUG ( 5818): afd10650 e304cdd3 e3043240 e92d4010 e341c062 I/DEBUG ( 5818): afd10660 e1a0e002 e24dd008 e340300f e1a0200d I/DEBUG ( 5818): I/DEBUG ( 5818): stack: I/DEBUG ( 5818): 100ffe18 00000000 I/DEBUG ( 5818): 100ffe1c 00000000 I/DEBUG ( 5818): 100ffe20 00000000 I/DEBUG ( 5818): 100ffe24 ffffff92 I/DEBUG ( 5818): 100ffe28 100ffe58 I/DEBUG ( 5818): 100ffe2c 00000000 I/DEBUG ( 5818): 100ffe30 00000080 I/DEBUG ( 5818): 100ffe34 8054f999 /system/lib/libdvm.so I/DEBUG ( 5818): 100ffe38 10000000 I/DEBUG ( 5818): 100ffe3c afd10640 /system/lib/libc.so I/DEBUG ( 5818): 100ffe40 00000000 I/DEBUG ( 5818): 100ffe44 00000000 I/DEBUG ( 5818): 100ffe48 00000000 I/DEBUG ( 5818): 100ffe4c 00000000 I/DEBUG ( 5818): 100ffe50 e3a07077 I/DEBUG ( 5818): 100ffe54 ef900077 I/DEBUG ( 5818): #01 100ffe58 00000000 I/DEBUG ( 5818): 100ffe5c 00000000 I/DEBUG ( 5818): 100ffe60 00000000 I/DEBUG ( 5818): 100ffe64 00000000 I/DEBUG ( 5818): 100ffe68 00000000 I/DEBUG ( 5818): 100ffe6c 00000000 I/DEBUG ( 5818): 100ffe70 00000000 I/DEBUG ( 5818): 100ffe74 00000000 I/DEBUG ( 5818): 100ffe78 00000000 I/DEBUG ( 5818): 100ffe7c 00000000 I/DEBUG ( 5818): 100ffe80 00000000 I/DEBUG ( 5818): 100ffe84 00000000 I/DEBUG ( 5818): 100ffe88 00000000 I/DEBUG ( 5818): 100ffe8c 00000000 I/DEBUG ( 5818): 100ffe90 00000000 I/DEBUG ( 5818): 100ffe94 00000000 I/DEBUG ( 5818): 100ffe98 00000000 I/DEBUG ( 5818): 100ffe9c 00000000 </code></pre> <p>Using ndk r6, Android platform 2.2 (API level 8), compiling with -Wall -Werror, ARM mode only.</p> <p>I'm looking at any ideas, especially those which are verifiable in a deterministic way. If more information would help, just leave a comment (or if you can't, an answer) and I'll update my question ASAP. Thanks for reading this far!</p> <h2>JNI Interface</h2> <p>There are both j2n and n2j calls. The only j2n calls right now are here:</p> <pre><code>private static class Renderer implements GLSurfaceView.Renderer { public void onDrawFrame(GL10 gl) { GraphicsLib.graphicsStep(); } public void onSurfaceChanged(GL10 gl, int width, int height) { GraphicsLib.graphicsInit(width, height); } public void onSurfaceCreated(GL10 gl, EGLConfig config) { // Do nothing. } } </code></pre> <p>This code goes through this interface:</p> <pre><code>public class GraphicsLib { static { System.loadLibrary("graphicslib"); } public static native void graphicsInit(int width, int height); public static native void graphicsStep(); } </code></pre> <p>Which on the native side looks like:</p> <pre><code>extern "C" { JNIEXPORT void JNICALL FN(graphicsInit)(JNIEnv* env, jobject obj, jint width, jint height); JNIEXPORT void JNICALL FN(graphicsStep)(JNIEnv* env, jobject obj); }; </code></pre> <p>The function definitions themselves begin with a copy of the prototypes.</p> <p>graphicsInit just stores away the dimensions it was passed and sets up OpenGL a bit without anything particularly interesting. graphicsStep clears the screen to a nice color and and calls <code>LoadSprites(env)</code>.</p> <p>The more complex side is comprised of n2j calls used in LoadSprites() which loads in a sprite every frame. Not an elegant solution, but it's been working with exception of this crash.</p> <p>LoadSprites works like this:</p> <pre><code>GameAssetsInfo gai; void LoadSprites(JNIEnv* env) { InitGameAssets(gai, env); CatchJNIException(env, "j0"); ... static int z = 0; if (z &lt; numSprites) { CatchJNIException(env, "j1"); OpenGameImage(gai, SpriteIDFromNumber(z)); CatchJNIException(env, "j2"); unsigned int actualWidth = GetGameImageWidth(gai); CatchJNIException(env, "j3"); unsigned int actualHeight = GetGameImageHeight(gai); CatchJNIException(env, "j4"); ... jint i; int r = 0; CatchJNIException(env, "j5"); do { CatchJNIException(env, "j6"); i = ReadGameImage(gai); CatchJNIException(env, "j7"); if (i &gt; 0) { // Deal with the pure data chunk -- One line at a time. CatchJNIException(env, "j8"); StoreGameImageChunk(gai, (int*)sprites[z].data + r, 0, i); ... r += sprites[z].width; CatchJNIException(env, "j9"); UnreadGameImage(gai); CatchJNIException(env, "j10"); } else { break; } } while (true); CatchJNIException(env, "j11"); CloseGameImage(gai); CatchJNIException(env, "j12"); ... OpenGL ES calls ... glTexImage2D( ... ); z++; } CatchJNIException(env, "j13"); } </code></pre> <p>Where CatchJNIException is this (and <strong>never</strong> prints anything for me):</p> <pre><code>void CatchJNIException(JNIEnv* env, const char* str) { jthrowable exc = env-&gt;ExceptionOccurred(); if (exc) { jclass newExcCls; env-&gt;ExceptionDescribe(); env-&gt;ExceptionClear(); newExcCls = env-&gt;FindClass( "java/lang/IllegalArgumentException"); if (newExcCls == NULL) { // Couldn't find the exception class.. Uuh.. LOGE("Failed to catch JNI exception entirely -- could not find exception class."); return; abort(); } LOGE("Caught JNI exception. (%s)", str); env-&gt;ThrowNew( newExcCls, "thrown from C code"); // abort(); } } </code></pre> <p>And the relevant part of GameAssetInfo and associated code is only called from native code and works like this:</p> <pre><code>void InitGameAssets(GameAssetsInfo&amp; gameasset, JNIEnv* env) { CatchJNIException(env, "jS0"); FST; char str[64]; sprintf(str, "%s/GameAssets", ROOTSTR); gameasset.env = env; CatchJNIException(gameasset.env, "jS1"); gameasset.cls = gameasset.env-&gt;FindClass(str); CatchJNIException(gameasset.env, "jS2"); gameasset.openAsset = gameasset.env-&gt;GetStaticMethodID(gameasset.cls, "OpenAsset", "(I)V"); CatchJNIException(gameasset.env, "jS3"); gameasset.readAsset = gameasset.env-&gt;GetStaticMethodID(gameasset.cls, "ReadAsset", "()I"); CatchJNIException(gameasset.env, "jS4"); gameasset.closeAsset = gameasset.env-&gt;GetStaticMethodID(gameasset.cls, "CloseAsset", "()V"); CatchJNIException(gameasset.env, "jS5"); gameasset.buffID = gameasset.env-&gt;GetStaticFieldID(gameasset.cls, "buff", "[B"); CatchJNIException(gameasset.env, "jS6"); gameasset.openImage = gameasset.env-&gt;GetStaticMethodID(gameasset.cls, "OpenImage", "(I)V"); CatchJNIException(gameasset.env, "jS7"); gameasset.readImage = gameasset.env-&gt;GetStaticMethodID(gameasset.cls, "ReadImage", "()I"); CatchJNIException(gameasset.env, "jS8"); gameasset.closeImage = gameasset.env-&gt;GetStaticMethodID(gameasset.cls, "CloseImage", "()V"); CatchJNIException(gameasset.env, "jS9"); gameasset.buffIntID = gameasset.env-&gt;GetStaticFieldID(gameasset.cls, "buffInt", "[I"); CatchJNIException(gameasset.env, "jS10"); gameasset.imageWidth = gameasset.env-&gt;GetStaticFieldID(gameasset.cls, "imageWidth", "I"); CatchJNIException(gameasset.env, "jS11"); gameasset.imageHeight = gameasset.env-&gt;GetStaticFieldID(gameasset.cls, "imageHeight", "I"); CatchJNIException(gameasset.env, "jS12"); gameasset.imageHasAlpha = gameasset.env-&gt;GetStaticFieldID(gameasset.cls, "imageHasAlpha", "I"); CatchJNIException(gameasset.env, "jS13"); } void OpenGameAsset(GameAssetsInfo&amp; gameasset, int rsc) { FST; CatchJNIException(gameasset.env, "jS14"); gameasset.env-&gt;CallStaticVoidMethod(gameasset.cls, gameasset.openAsset, rsc); CatchJNIException(gameasset.env, "jS15"); } void CloseGameAsset(GameAssetsInfo&amp; gameasset) { FST; CatchJNIException(gameasset.env, "jS16"); gameasset.env-&gt;CallStaticVoidMethod(gameasset.cls, gameasset.closeAsset); CatchJNIException(gameasset.env, "jS17"); } int ReadGameAsset(GameAssetsInfo&amp; gameasset) { FST; CatchJNIException(gameasset.env, "jS18"); int ret = gameasset.env-&gt;CallStaticIntMethod(gameasset.cls, gameasset.readAsset); CatchJNIException(gameasset.env, "jS19"); if (ret &gt; 0) { CatchJNIException(gameasset.env, "jS20"); gameasset.obj = gameasset.env-&gt;GetStaticObjectField(gameasset.cls, gameasset.buffID); CatchJNIException(gameasset.env, "jS21"); gameasset.arr = reinterpret_cast&lt;jbyteArray*&gt;(&amp;gameasset.obj); } return ret; } void UnreadGameAsset(GameAssetsInfo&amp; gameasset) { FST; CatchJNIException(gameasset.env, "jS22"); gameasset.env-&gt;DeleteLocalRef(gameasset.obj); CatchJNIException(gameasset.env, "jS23"); } void StoreGameAssetChunk(GameAssetsInfo&amp; gameasset, void* store, int offset, int length) { FST; CatchJNIException(gameasset.env, "jS24"); gameasset.env-&gt;GetByteArrayRegion(*gameasset.arr, offset, length, (jbyte*)store); CatchJNIException(gameasset.env, "jS25"); } void OpenGameImage(GameAssetsInfo&amp; gameasset, int rsc) { FST; CatchJNIException(gameasset.env, "jS26"); gameasset.env-&gt;CallStaticVoidMethod(gameasset.cls, gameasset.openImage, rsc); CatchJNIException(gameasset.env, "jS27"); gameasset.l_imageWidth = (int)gameasset.env-&gt;GetStaticIntField(gameasset.cls, gameasset.imageWidth); CatchJNIException(gameasset.env, "jS28"); gameasset.l_imageHeight = (int)gameasset.env-&gt;GetStaticIntField(gameasset.cls, gameasset.imageHeight); CatchJNIException(gameasset.env, "jS29"); gameasset.l_imageHasAlpha = (int)gameasset.env-&gt;GetStaticIntField(gameasset.cls, gameasset.imageHasAlpha); CatchJNIException(gameasset.env, "jS30"); } void CloseGameImage(GameAssetsInfo&amp; gameasset) { FST; CatchJNIException(gameasset.env, "jS31"); gameasset.env-&gt;CallStaticVoidMethod(gameasset.cls, gameasset.closeImage); CatchJNIException(gameasset.env, "jS32"); } int ReadGameImage(GameAssetsInfo&amp; gameasset) { FST; CatchJNIException(gameasset.env, "jS33"); int ret = gameasset.env-&gt;CallStaticIntMethod(gameasset.cls, gameasset.readImage); CatchJNIException(gameasset.env, "jS34"); if ( ret &gt; 0 ) { CatchJNIException(gameasset.env, "jS35"); gameasset.obj = gameasset.env-&gt;GetStaticObjectField(gameasset.cls, gameasset.buffIntID); CatchJNIException(gameasset.env, "jS36"); gameasset.arrInt = reinterpret_cast&lt;jintArray*&gt;(&amp;gameasset.obj); } return ret; } void UnreadGameImage(GameAssetsInfo&amp; gameasset) { FST; CatchJNIException(gameasset.env, "jS37"); gameasset.env-&gt;DeleteLocalRef(gameasset.obj); CatchJNIException(gameasset.env, "jS38"); } void StoreGameImageChunk(GameAssetsInfo&amp; gameasset, void* store, int offset, int length) { FST; CatchJNIException(gameasset.env, "jS39"); gameasset.env-&gt;GetIntArrayRegion(*gameasset.arrInt, offset, length, (jint*)store); CatchJNIException(gameasset.env, "jS40"); } int GetGameImageWidth(GameAssetsInfo&amp; gameasset) { return gameasset.l_imageWidth; } int GetGameImageHeight(GameAssetsInfo&amp; gameasset) { return gameasset.l_imageHeight; } int GetGameImageHasAlpha(GameAssetsInfo&amp; gameasset) { return gameasset.l_imageHasAlpha; } </code></pre> <p>And it's backed by this on the Java side:</p> <pre><code>public class GameAssets { static public Resources res = null; static public InputStream is = null; static public byte buff[]; static public int buffInt[]; static public final int buffSize = 1024; static public final int buffIntSize = 2048; static public int imageWidth; static public int imageHeight; static public int imageHasAlpha; static public int imageLocX; static public int imageLocY; static public Bitmap mBitmap; static public BitmapFactory.Options decodeResourceOptions = new BitmapFactory.Options(); public GameAssets(Resources r) { res = r; buff = new byte[buffSize]; buffInt = new int[buffIntSize]; decodeResourceOptions.inScaled = false; } public static final void OpenAsset(int id) { is = res.openRawResource(id); } public static final int ReadAsset() { int num = 0; try { num = is.read(buff); } catch (Exception e) { ; } return num; } public static final void CloseAsset() { try { is.close(); } catch (Exception e) { ; } is = null; } // We want all the advantages that BitmapFactory can provide -- reading // images of compressed image formats -- so we provide our own interface // for it. public static final void OpenImage(int id) { mBitmap = BitmapFactory.decodeResource(res, id, decodeResourceOptions); imageWidth = mBitmap.getWidth(); imageHeight = mBitmap.getHeight(); imageHasAlpha = mBitmap.hasAlpha() ? 1 : 0; imageLocX = 0; imageLocY = 0; } public static final int ReadImage() { if (imageLocY &gt;= imageHeight) return 0; int numReadPixels = buffIntSize; if (imageLocX + buffIntSize &gt;= imageWidth) { numReadPixels = imageWidth - imageLocX; mBitmap.getPixels(buffInt, 0, imageWidth, imageLocX, imageLocY, numReadPixels, 1); imageLocY++; } else { mBitmap.getPixels(buffInt, 0, imageWidth, imageLocX, imageLocY, numReadPixels, 1); imageLocX += numReadPixels; } return numReadPixels; } public static final void CloseImage() { } } </code></pre> <p>Please note the distinct lack of thread safety in the game asset code.</p> <p>Let me know if more information would be useful.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload