Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There's too little information in your post to make a decisive answer. However, it might be that no information available to you now would change it, and you need to debug the case more carefully. Here's what I'd do.</p> <p>To debug, you want repeatability. But… you say that you're using random numbers. It seems though, that what your program does is some scientific-ish computations. In most cases you don't actually need “true” randomness, but “repeatable” randomness—randomness which passes statistical tests, but where you have enough data to reset the random number generator so that it will produce the exactly the same results as in a previous run. For that, you can just write down the current RNG state (e.g. seed) every time you start a new block of computation.</p> <p>Now, write some piece of code that will store all the state necessary to restart computations (including RNG) once every few minutes, and run the program. This way, if your code crashes, you will be able to restart the computations with the same exact state and get to the point where it crashed without waiting for millions of iterations. I am putting a strong assumption here, that except for RNG your code does not depend on any other kind of external state (like, network activity, IO, process scheduler making certain choices when scheduling your threads…)</p> <p>With this kind of data it will be easier to test if the problem is due to a machine fault (overheating, bad memory, etc.). Simply restart the computation with the last state before crashing—preferably after letting the machine cool down, maybe restarting it… if you'll encounter another crash (and it will happen every time you try to restart code), it's quite certain it's due to a bug in your code.</p> <p>If not, we still cannot say that it's machine fault—your code might (by pure accident/mistake in code) crash due to an undefined behavior which depends on factors out of your control. Examples include using an uninitialized pointer in a rarely-taken code path: it might throw bad access sometimes, and go unnoticed if by pure luck the pointer points to memory you allocated. Try <a href="http://valgrind.org/" rel="nofollow">valgrind</a>, this is probably the best tool to check for memory problems… except that it slows down execution so much that you'll again prefer to rerun the computations from a state known to be suspicious (the last state before crash) instead of waiting for millions of iterations. I've seen slowdowns of 5x to 100x.</p> <p>In the meantime, try running your code on another machine. If you'll also get crashes after a similar number of iterations (to be sure wait for at least 3 times more iterations than it took to crash on the original machine), then it's quite probable that it's a bug in your code.</p> <p>Happy hacking!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload