Note that there are some explanatory texts on larger screens.

plurals
  1. POString parsing and matching algorithm
    primarykey
    data
    text
    <p><em>I am solving the following problem:</em></p> <p>Suppose I have a list of software packages and their names might looks like this <em>(the only known thing is that these names are formed like <code>SOMETHING + VERSION</code>, meaning that the version always comes <strong>after</strong> the name)</em>:</p> <pre><code>Efficient.Exclusive.Zip.Archiver-PROPER.v.122.24-EXTENDED Efficient.Exclusive.Zip.Archiver.123.01 Efficient-Exclusive.Zip.Archiver(2011)-126.24-X Zip.Archiver14.06 Zip-Archiver.v15.08-T Custom.Zip.Archiver1.08 Custom.Zip.Archiver1 </code></pre> <p>Now, I need to parse this list and select only <strong>latest versions</strong> of each package. For this example the expected result would be:</p> <pre><code>Efficient-Exclusive.Zip.Archiver(2011)-126.24-X Zip-Archiver.v15.08-T Custom.Zip.Archiver1.08 </code></pre> <hr> <p>Current approach that I use can be described the following way:</p> <pre><code>Split the initial strings into groups by their starting letter, ignoring spaces, case and special symbols. (`E`, `Z`, `C` for the example list above) Foreach element { Apply the regular expression (or a set of regular expressions), which tries to deduce the version from the string and perform the following conversion `STRING -&gt; (VERSION, STRING_BEFORE_VERSION)` // Example for this step: // 'Efficient.Exclusive.Zip.Archiver-PROPER.v.122.24-EXTENDED' -&gt; // (122.24, Efficient.Exclusive.Zip.Archiver-PROPER) Search through the corresponding group (in this example - the 'E' group) and find every other strings, which starts from the 'STRING_BEFORE_VERSION' or from it's significant part. This comparison is performed in ignore-case and ignore-special-symbols mode. // The matches for this step: // Efficient.Exclusive.Zip.Archiver-PROPER, {122.24} // Efficient.Exclusive.Zip.Archiver, {123.01} // Efficient-Exclusive.Zip.Archiver, {126.24, 2011} // The last one will get picked, because year is ignored. Get the possible version from each match, ***pick the latest, yield that match.*** Remove every possible match (including the initial element) from the list. } </code></pre> <p>This algorithm (as I assume) should work for something like <code>O(N * V + N lg N * M)</code>, where <code>M</code> stands for the average string matching time and <code>V</code> stands for the version regexp working time.</p> <hr> <p><em>However, I suspect there is a better solution <strong>(there always is!)</strong>, maybe specific data structure or better matching approach.</em></p> <p>If you <strong>can suggest something</strong> or <strong>make some notes on the current approach</strong>, please do not hesitate to do this.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload