Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>To answer your question "am I asking too much of regexps?"- Yes</p> <p>I spent the better part of 2 years working on a profanity filter using regular expressions and finally gave up. During this time, I tried all of these things:</p> <ul> <li>Pre-compiling</li> <li>Character classes (punctuation, whitespace, etc)</li> <li>Non-capturing groups (mentioned above and can greatly reduce memory and increase speed)</li> <li>Combining similar regexps (also mentioned above)</li> <li>Trimming whitespace (str.trim())</li> <li>Case handling (str.toLowerCase())</li> <li>Packing and unpacking whitespace (convert multiple adjacent whitespace to a single space and vice-versa)</li> <li>Writing my own custom regexp engine (highly unrecommended as it is complex and not scalable)</li> </ul> <p>Nothing worked well and as my blacklist grew my system slowed down. In the end I gave up and implemented a linear analysis filter, which is now the core part of CleanSpeak, <a href="http://www.inversoft.com/products/cleanspeak-profanity-filter-moderation-software" rel="nofollow noreferrer">my company's profanity filtering product</a>.</p> <p>We found that we were also able to do some great multi-threading and other optimizations once we stopped using regexps and went from handling 600-700 messages per second to 10,000+ messages per second.</p> <p>Lastly, we also found that performing linear analysis made the filter more accurate and allowed us to solve the "scunthrope problem" and many of the other ones people have mentioned in the comments here.</p> <p>You can definitely try all of the things I mention above and see if you can get your performance up, but it is a hard problem to solve because regexps weren't really designed for language analysis. They were designed for text analysis, which is a very different problem.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload