Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>OK, I know that SO is not a pure forum and maybe I shouldn't answer my own question but I'd like to share with my results.</p> <p>Finally, thanks to you guys, I managed to get better optimization of my text preprocessing. First of all I made simpler that long expression from my question (following Josh Kelley's answer):</p> <pre>[0-9]|[^\w]|(\b\w{1,2}\b)</pre> <p>It does the same as first one but is very simple. Then following Josh Kelley's suggestion again I put this regex into assembly. Great example of compiling expressions into assembly I found <a href="http://www.dijksterhuis.org/regular-expressions-advanced/" rel="nofollow noreferrer">here</a>. I did that, because this regex is used many, many times. After lecture of few articles about compiled regex, that was my decision. I removed the last expression after eliminating stop words (no real sense with that).</p> <p>So the execution time on 12KiB text file was ~15ms. This is only for expression mentioned above.</p> <p>Last step were stop words. I decided to make a test for 3 different options (Execution times are for the same 12KiB text file).</p> <h3>One big Regular Expression</h3> <p>with all stop words and compiled into assembly (mquander's suggestion). Nothing to clear here.</p> <ul> <li>Execution time: ~215ms</li> </ul> <h3>String.Replace()</h3> <p>People say that this can be faster than Regex. So for each stop word I used <code>string.Replace()</code> method. Many loops to take with result:</p> <ul> <li>Execution time: ~65ms</li> </ul> <h3>LINQ</h3> <p>method presented by LBushkin. Nothing to say more.</p> <ul> <li>Execution time: ~2.5ms</li> </ul> <p>I can only say wow. Just compare execution times of first one with the last one! Big thanks LBushkin!</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload