Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><b>Note:</b> I am the author of RegexKit et al.</p> <p>This is a fairly complicated answer.. :)</p> <p>First, matching a thousand regexes with any of commonly available regex engine implementations is going to be fairly slow, save for perhaps the TCL and TRE regex engines. The reason why <code>RegexKit.framework</code> greatly outperforms <code>RegexKitLite</code> for this task is <code>RegexKit.framework</code> has quite a bit of non-trivial, optimized code for just this task. The reason for this is because it's used in Safari AdBlock, which needs to perform bulk matches of regexes against URLs. It keeps the list of regexes in sorted order, based on the number of times they made a successful match. This is based on the observation that some regex patterns used in Safari AdBlock match much more frequently than others, and trying those first dramatically reduces the amount of regexes that need to be tried to determine if there's a 'hit'. There is also a small negative hit cache as well, along with a lot of multithreading code to do the matches in parallel. None of this will ever make it in to the <code>Lite</code> version as it is definitely not a light-weight feature- there's probably 60-70KB of code just to implement this one feature alone, not to mention the huge memory footprint of keeping a thousand compiled regexes around.</p> <p>Using <code>RegexKitLite</code> to do this kind of pattern matching is bound to be very, very slow. The first problem is that it only keeps a small cache of compiled regexes that have recently been used. By default, the cache is set to just <code>23</code>, so tossing a thousand regexes at it is going to cause every regex to be compiled each time its used.</p> <p>As others have pointed out, <code>RegexKit.framework</code> isn't really set up to be used on the iPhone. Even if you got around the "linking to external frameworks" provision, the default build of <code>RegexKit.framework</code> does not include the <code>arm</code> architecture in its fat binary (it includes <code>ppc</code>, <code>ppc64</code>, <code>i386</code>, and <code>x86_64</code>). What you really need to do is set up a new build target that creates a static library. Not terribly hard to do, really.</p> <p>I'm afraid that if this kind of pattern matching is something you need to do, you're probably going to have to roll your own regex engine. What you need is a regex engine that can take your thousand regexes and concatenate them together, such as "<code>r1|r2|r3|r4</code>". Most regex engines, and in particular <code>pcre</code> and <code>ICU</code> (the ones used by <code>RegexKit.framework</code> and <code>RegexKitLite</code>, respectively), evaluate such a regex in an almost left to right manner. What's needed is an almost DFA like engine that evaluates all possible states concurrently. See <a href="http://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow noreferrer">this link</a> for more information. I've built such a regex engine, one that even handles back-references (much easier to do than everyone says) in ~<code>O(M*log2(N))</code> (M being the size of the text to match, N being the size of the regex) time, but it's not finished. If it was, it would cut through this kind of problem like a plasma torch through butter.</p> <p>I am aware of at least one person porting <code>RegexKit.framework</code> to the iPhone, though: <a href="http://cocoamug.com/adblock/" rel="nofollow noreferrer">Mobile Safari AdBlock</a>. AFAIK, it's also a port of the desktop version of Safari AdBlock. I don't know many details, but I think it requires a jail-broken iPhone to install.</p> <p>In summary, I don't think there's any turn-key solutions available for iPhone development that do anything close to what you need. Your best bet, other than creating your own regex engine, is to look in to the <a href="http://www.laurikari.net/tre/" rel="nofollow noreferrer">TRE regex engine</a> and try some experiments using concatenated regexes. Be prepared to roll up your sleeves, though, as you're going to have to get your hands very dirty and deal with the guts of Cocoa's strings, Unicode encodings, and all kinds of other unpleasant stuff- the kind of stuff that <code>RegexKitLite</code> takes care of for you behind the scenes.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload