Note that there are some explanatory texts on larger screens.

plurals
  1. POHalf-space in regex
    text
    copied!<p>I am supposed to write a little program that takes in a Persian text and in some places changes the space to half-space. The half-space or a <a href="http://en.wikipedia.org/wiki/Zero-width_non-joiner" rel="nofollow">zero-width non-joiner</a> is used in some languages to avoid a <a href="http://en.wikipedia.org/wiki/Typographic_ligature" rel="nofollow">ligature</a> when normalizing a text. It's unicode character is supposedly <code>'\u200c'</code> and in some text-editors it can be shown on the screen with a SHIFT+SPACE:</p> <pre><code>import re txt = input('Please enter a Persian text: ') original_pattern = r'\b(\w+)\s*(ها|هايي|هايم|هاي)\b' new_pattern = r'\1 \2' new_txt = re.sub (original_pattern, new_pattern, txt) print (new_txt) </code></pre> <p>In the code above, <code>new_pattern</code> is supposed to introduce a half-space between <code>\1</code> and <code>\2</code>, currently there is a space between them.</p> <p>The question is: How can I put a half-space there? I tried the following and in both cases got a syntax error:</p> <pre><code>new_pattern = ur'\1\u200c\2' new_pattern = r'\1\u200c\2' </code></pre> <p>By the way, although in the Wikipedia article the unicode character for ZWNJ is given as U+200c, it doesn't seem to be working that way in the python shell and it is actually doubling the space:</p> <pre><code>&gt;&gt;&gt; print ('He is a',u'\u200c','boy') He is a ‌ boy &gt;&gt;&gt; print ("کتاب",u"\u200c","ها") کتاب ‌ ها &gt;&gt;&gt; print ("کتاب ها") کتاب ها &gt;&gt;&gt; </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload