StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython Regex Engine - "look-behind requires fixed-width pattern" Error
text
Body
copied!<p>I am trying to handle un-matched double quotes within a string in the CSV format. </p> <p>To be precise,</p> <pre><code>"It "does "not "make "sense", Well, "Does "it" </code></pre> <p>should be corrected as </p> <pre><code>"It" "does" "not" "make" "sense", Well, "Does" "it" </code></pre> <p>So basically what I am trying to do is to </p> <blockquote> <p>replace all the ' " '</p> <ol> <li>Not preceded by a beginning of line or a comma (and)</li> <li>Not followed by a comma or an end of line</li> </ol> <p>with ' " " '</p> </blockquote> <p>For that I use the below regex</p> <pre><code>(?<!^|,)"(?!,|$) </code></pre> <p>The problem is while Ruby regex engines ( <a href="http://www.rubular.com/">http://www.rubular.com/</a> ) are able to parse the regex, python regex engines (<a href="https://pythex.org/">https://pythex.org/</a> , <a href="http://www.pyregex.com/">http://www.pyregex.com/</a>) throw the following error</p> <pre><code>Invalid regular expression: look-behind requires fixed-width pattern </code></pre> <p>And with python 2.7.3 it throws</p> <pre><code>sre_constants.error: look-behind requires fixed-width pattern </code></pre> <p>Can anyone tell me what vexes python here?</p> <p>==================================================================================</p> <h1>EDIT :</h1> <p>Following Tim's response, I got the below output for a multi line string</p> <pre><code>>>> str = """ "It "does "not "make "sense", Well, "Does "it" ... "It "does "not "make "sense", Well, "Does "it" ... "It "does "not "make "sense", Well, "Does "it" ... "It "does "not "make "sense", Well, "Does "it" """ >>> re.sub(r'\b\s*"(?!,|$)', '" "', str) ' "It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" " ' </code></pre> <p>At the end of each line, next to 'it' two double-quotes were added.</p> <p>So I made a very small change to the regex to handle a new-line. </p> <pre><code>re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE) </code></pre> <p>But this gives the output</p> <pre><code>>>> re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE) ' "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it" " ' </code></pre> <p>The last 'it' alone has two double-quotes.</p> <p>But I wonder why the '$' end of line character will not identify that the line has ended. </p> <p>==================================================================================</p> <p>The final answer is </p> <pre><code>re.sub(r'\b\s*"(?!,|[ \t]*$)', '" "', str,flags=re.MULTILINE) </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload