StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>There are absolutely times where your intimate understanding of your data can result in much, much more efficient sorting algorithms than any general purpose algorithm available. I shared an example of such a situation in another post at SO, but I'll share it hear just to provide a case-in-point:</p> <p>Back in the days of COBOL, FORTRAN, etc... a developer working for a phone company had to take a relatively large chunk of data that consisted of active phone numbers (I believe it was in the New York City area), and sort that list. The original implementation used a heap sort (these were 7 digit phone numbers, and a lot of disk swapping was taking place during the sort, so heap sort made sense).</p> <p>Eventually, the developer stumbled on a different approach: By realizing that one, and only one of each phone number could exist in his data set, he realized that he didn't have to store the actual phone numbers themselves in memory. Instead, he treated the entire 7 digit phone number space as a very long bit array (at 8 phone numbers per byte, 10 million phone numbers requires just over a meg to capture the entire space). He then did a single pass through his source data, and set the bit for each phone number he found to 1. He then did a final pass through the bit array looking for high bits and output the sorted list of phone numbers.</p> <p>This new algorithm was much, much faster (at least 1000x faster) than the heap sort algorithm, and consumed about the same amount of memory.</p> <p>I would say that, in this case, it absolutely made sense for the developer to develop his own sorting algorithm.</p> <p>If your application is all about sorting, and you really know your problem space, then it's quite possible for you to come up with an application specific algorithm that beats any general purpose algorithm.</p> <p>However, if sorting is an ancillary part of your application, or you are just implementing a general purpose algorithm, chances are very, very good that some extremely smart university types have already provided an algorithm that is better than anything you will be able to come up with. Quick Sort is really hard to beat if you can hold things in memory, and heap sort is quite effective for massive data set ordering (although I personally prefer to use B+Tree type implementations for the heap b/c they are tuned to disk paging performance).</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload