Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Assuming that you can load your data into Mathematica in the form you outlined, one very simple thing to do is to create a hash-table, where your trigrams will be the (compound) keys. Here is your sample (the part of it that you gave):</p> <pre><code>trigrams = {{{"wa", "wa", "wa"}, 66}, {{"i", "love", "you"}, 62}, {{"la", "la", "la"}, 50}, {{"meaning", "of", "life"}, 42}, {{"on", "come", "on"}, 40}, {{"come", "on", "come"}, 40}, {{"yeah", "yeah", "yeah"}, 38}, {{"no", "no", "no"}, 36}, {{"we", "re", "gonna"}, 36}, {{"you", "love", "me"}, 35}, {{"in", "love", "with"}, 32}, {{"the", "way", "you"}, 30}, {{"i", "want", "to"}, 30}, {{"back", "to", "me"}, 29}, {{"of", "an", "xke"}, 1}}; </code></pre> <p>Here is one possible way to create a hash-table:</p> <pre><code>Clear[trigramHash]; (trigramHash[Sequence @@ #1] = #2) &amp; @@@ trigrams; </code></pre> <p>Now, we use it like</p> <pre><code>In[16]:= trigramHash["meaning","of","life"] Out[16]= 42 </code></pre> <p>This approach will be beneficial if you perform many searches, of course. </p> <p><strong>EDIT</strong></p> <p>If you have many files and want to search them efficiently in Mathematica, one thing you could do is to use the above hashing mechanism to convert all your files to <code>.mx</code> binary Mathematica files. These files are optimized for fast loading, and serve as a persistence mechanism for definitions you want to store. Here is how it may work:</p> <pre><code>In[20]:= DumpSave["C:\\Temp\\trigrams.mx",trigramHash] Out[20]= {trigramHash} In[21]:= Quit[] In[1]:= Get["C:\\Temp\\trigrams.mx"] In[2]:= trigramHash["meaning","of","life"] Out[2]= 42 </code></pre> <p>You use <code>DumpSave</code> to create an <code>.mx</code> file. So, the suggested procedure is to load your data into Mathematica, file by file, create hashes (you could use <code>SubValues</code> to index a particular hash-table with an index of your file), and then save those definitions into <code>.mx</code> files. In this way, you get fast load and fast search, and you have a freedom to decide which part of your data to keep loaded into Mathematica at any given time (pretty much without a performance hit, normally associated with file loading).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload