Note that there are some explanatory texts on larger screens.

plurals
  1. POCapturing negative matches in Python regular expressions
    primarykey
    data
    text
    <p>I'm trying to do a non-greedy negative match, and I need to capture it as well. I'm using these flags in Python, re.DOTALL | re.LOCALE | re.MULTILINE, to do multi-line cleanup of some text-file 'databases' in which each field begins on a new line with a backslash. Each record begins with an \lx field.</p> <pre><code>\lx foo \ps n \nt note 1 \ps v \nt note \ge happy \nt note 2 \ge lonely \nt note 3 \ge lonely \dt 19/Dec/2011 \lx bar ... </code></pre> <p>I'm trying to ensure that each \ge field has a \ps field somewhere above it within its record, one for one. Currently, one \ps is often followed by several \ge and thus needs to be copied down, as with the two lonely \ge above. </p> <p>Here's most of the needed logic: after any \ps field, but before encountering another \ps or \lx, find a \ge, then find another \ge. Capture everything so that the \ps field can be copied down to just before the second \ge.</p> <p>And here's my non-functional attempt. Replace this:</p> <pre><code>^(\\ps\b.*?\n)((?!^\\(ps|lx)*?)^(\\ge.*?\n)((?!^\\ps)*?)^(\\ge.*?\n) </code></pre> <p>with this:</p> <pre><code>\1\2\3\4\1\5 </code></pre> <p>I'm getting a memory error even on a tiny file (34 lines long). Of course, even if this worked, I would have to run it multiple times, since it's only trying to handle a second \ge, and not a third or fourth one. So any ideas in that regard would interest me as well.</p> <p><strong>UPDATE:</strong> Alan Moore's solution worked great, although there were cases that required a little tweaking. Sadly, I had to turn off DOTALL since otherwise I couldn't prevent the first .* including subsequent \ps fields--even with the non-greedy .*? form. But I was delighted to learn about the (?s) modifier just now at regular-expressions dot info. This allowed me to turn off DOTALL in general but still use it in other regexes that it <strong>is</strong> essential for.</p> <p>Here is the suggested regex, condensed down to the one-line format I need:</p> <pre><code>^(?P&lt;PS_BLOCK&gt;(?P&lt;PS_LINE&gt;\\ps.*\n)(?:(?!\\(?:ps|lx|ge)).*\n)*\\ge.*\n)(?P&lt;GE_BLOCK&gt;(?:(?!\\(?:ps|lx|ge)).*\n)*\\ge.*\n) </code></pre> <p>That worked, but when I modified the example above, it inserted the \ps above "note 2". It also was treating \lxs and \ge2 the same as \lx and \ge (needed a few \b). So, I went with a slightly tweaked version:</p> <pre><code>^(?P&lt;PS_BLOCK&gt;(?P&lt;PS_LINE&gt;\\ps\b.*\n)(?:(?!\\(?:ps|lx|ge)\b).*\n)*\\ge\b.*\n)(?P&lt;AFTER_GE1&gt;(?:(?!\\(?:ps|lx|ge)\b).*\n)*)(?P&lt;GE2_LINE&gt;\\ge\b.*\n) </code></pre> <p>and this replacement string:</p> <pre><code>\g&lt;PS_BLOCK&gt;\g&lt;AFTER_GE1&gt;\g&lt;PS_LINE&gt;\g&lt;GE2_LINE&gt; </code></pre> <p>Thanks again!</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload