Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Let's consider the tradeoffs before any suggestions.</p> <p>You say you need to store "millions" of paths. I'll assume one million, because it makes the calculations easier (and even on servers, I haven't seen more than a million directories).</p> <p>How long are those paths? You've shown an example with very short paths, so we're looking at maybe a hundred megabytes to store those million paths. I don't have a reference for maximum path lengths, but 256 characters sticks in my mind. So your paths will take a maximum of 512 Mb of memory. Do you have that much memory?</p> <p>How evenly distributed are the pathnames? In other words, do you follow an 80:20 rule where 80% of the paths are found in 20% of the directories? The reason that I ask is because a trie structure needs some form of index between levels. If you have a lot of directories with only a few paths under them, you're going to have a lot of overhead to maintain a trie.</p> <p>The recommendations: if I had enough memory, I'd use a <code>HashSet&lt;String&gt;</code> and be done with it.</p> <p>If I didn't have a lot of memory, and a directory structure that followed the 80:20 rule (or, more likely, 95:5), I'd think of a <code>HashMap&lt;String,Set&lt;String&gt;&gt;</code>. The key of this map would be the longest leading path string that had a "reasonable" amount of duplication, and the values would be the remaining strings. You'd probe this map with progressively shorter leading components until you found a match, then probe the set for the remainder.</p> <p>That leaves open the question of "reasonable" duplication. It's the amount of duplication where the overhead of a two-piece data structure is overcome by the reduction in duplication. For example, <code>/usr/bin/</code> might be valid (because it holds thousands of files and you save 9 characters or 18 bytes from each), but <code>/usr/local/bin/</code> probably wouldn't be (at least on my system, it only holds a single file).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload