StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<blockquote> <p>The tree is massive</p> </blockquote> <p>By far the most expensive thing a processor ever does is not executing instructions, it is accessing memory. The execution core of a modern <a href="http://en.wikipedia.org/wiki/Central_processing_unit" rel="noreferrer">CPU</a> is <em>many</em> times faster than the memory bus. A problem related to <em>distance</em>, the further an electrical signal has to travel, the harder it gets to get that signal delivered to the other end of the wire without it being corrupted. The only cure for that problem is to make it go slower. A big problem with the wires that connect the CPU to the RAM in your machine, you can pop the case and <em>see</em> the wires.</p> <p>Processors have a counter-measure for this problem, they use <em>caches</em>, buffers that store a copy of the bytes in RAM. An important one is the <a href="http://en.wikipedia.org/wiki/CPU_cache#Multi-level_caches" rel="noreferrer">L1 cache</a>, typically 16 kilobytes for data and 16 kilobytes for instructions. Small, allowing it to be close to the execution engine. Reading bytes from the L1 cache typically takes 2 or 3 CPU cycles. Next up is the L2 cache, bigger and slower. Upscale processors also have an L3 cache, bigger and slower yet. As process technology improves, those buffers take less space and automatically becomes faster as they get closer to the core, a big reason why newer processors are better and how they manage to use an ever increasing number of transistors.</p> <p>Those caches are however not a perfect solution. The processor will still stall on a memory access if the data is not available in one of the caches. It cannot continue until the very slow memory bus has supplied the data. Losing a fat hundred CPU cycles is possible on a single instruction.</p> <p>Tree structures are a problem, they are <strong>not</strong> cache friendly. Their nodes tend to be scattered throughout the address space. The fastest way to access memory is by reading from sequential addresses. The unit of storage for the L1 cache is 64 bytes. Or in other words, once the processor reads <em>one</em> byte, the next 63 are very fast since they'll be present in the cache.</p> <p>Which makes an <strong>array</strong> by far the most efficient data structure. Also the reason that the .NET List<> class isn't a list at all, it uses an array for storage. The same for other collection types, like Dictionary, structurally not remotely similar to an array, but internally implemented with arrays.</p> <p>So your while() statement is very likely to be suffering from CPU stalls because it is dereferencing a pointer to access the BranchData field. The next statement is very cheap because the while() statement already did the heavy lifting of retrieving the value from memory. Assigning the local variable is cheap, a processor uses a buffer for writes.</p> <p>Not otherwise a simple problem to solve, flattening your tree into arrays is very likely to be unpractical. Not in the least because you typically cannot predict in which order the nodes of the tree will be visited. A red-black tree might help, it isn't clear from the question. So a simple conclusion to draw is that it is already running as fast you can hope for. And if you need it to go faster then you'll need better hardware with a faster memory bus. <a href="http://en.wikipedia.org/wiki/DDR4_SDRAM" rel="noreferrer">DDR4</a> is going mainstream this year.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload