StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<h2>Edit</h2> <p>So after reading your further explanations, I would say that my previous proposal, as well as MRAB's one are somehow similar and won't be of any help here. Your problem <em>is</em> actually the prolem of <em>nested structures</em>.</p> <p>Think of your 'prefix' and 'suffix' as symbols. You could easily replace them with an opening and a closing parenthesis or whatever, and what you want is being able to match only the smallest (then deepest) pair ... </p> <p>For example if your prefix is 'ABC.' and your suffix is 'XYZ.':</p> <pre><code>ABChello worldABCfooABCbarXYZ </code></pre> <p>You want to get only <code>ABCbarXYZ</code>.</p> <p>It's the same if the prefix is <code>(</code>, and the suffix is <code>)</code>, the string:</p> <pre><code>(hello world(foo(bar) </code></pre> <p>It would match ideally only <code>(bar)</code> ... </p> <p>Definitely you have to use a <a href="http://en.wikipedia.org/wiki/Context-free_grammar" rel="nofollow">context free grammar</a> (like programming languages do: <a href="http://www.lysator.liu.se/c/ANSI-C-grammar-y.html" rel="nofollow">C grammar</a>, <a href="http://docs.python.org/reference/grammar.html" rel="nofollow">Python grammar</a>) and a <a href="http://www.ling.helsinki.fi/kit/2008s/clt231/nltk-0.9.5/doc/en/ch07.html" rel="nofollow">parser</a>, or make your own by using regex as well as the iterating and storing mechanisms of your programming language.</p> <p>But that's not possible with only regular expressions. They would probably help in your algorithm, but they just are not designed to handle that alone. Not the good tool for that job ... You cannot inflate tires with a screwdriver. Therefore, you will have to use some external mechanisms, not complicated though, to store the context, your position in the nested stack. Using your regular expression in each single context still may be possible. </p> <p><em>Finite state machines</em> are <em>finite</em>, and nested structures have an <em>arbitrary depth</em> that would require your automaton to grow arbitrarily, thus they are not <a href="http://en.wikipedia.org/wiki/Regular_language" rel="nofollow"><em>regular languages</em></a>.</p> <blockquote> <p>Since recursion in a grammar allows the definition of nested syntactic structures, any language (including any programming language) which allows nested structures is a context-free language, not a regular language. For example, the set of strings consisting of balanced parentheses [like a LISP program with the alphanumerics removed] is a context-free language <a href="http://www.augustana.ca/~jmohr/courses/common/csc370/lecture_notes/chomsky.html" rel="nofollow">see here</a></p> </blockquote> <h2>Former proposal (not relevant anymore)</h2> <p>If I do:</p> <pre><code>>>> s = """ABC content 1 123 content 2 ABC content 3 XYZ""" >>> r = re.compile(r'A+B+C+[^A]+[^B]+[^C]+XYZ', re.I) >>> re.findall(r,s) </code></pre> <p>I get</p> <pre><code>['ABC\ncontent 3\nXYZ'] </code></pre> <p>Is that what you want ?</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload