Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I have something rather complicated, but I wrote it in hurry and for the moment it does the job.</p> <p>Note that:</p> <ul> <li><p>I added 'in France' after <strong>Meanwhile is the studio 7 album by British pop band 10cc</strong><br> and that only <strong>British</strong> is modified</p></li> <li><p>'1978' in <strong>by the german band Genesis 8 and was released in 1978</strong> isn't modified while the '8' is modified. </p></li> </ul> <p>That's the reason for which it is complicated.</p> <p>But I fear that,despite this complication, it won't be exact for all the sentences possible. </p> <p>Improvement should be done to make that <strong>idi</strong> would always be the correct musical group's name, and not always the first one as it is in this present solution. But it's a hard work without knowing what you exactly want</p> <pre><code>ss ='''british 7 German 8 France 90''' text = '''&lt;s id="69-7"&gt;...Meanwhile is the studio 7 album by British pop band 10cc in France.&lt;/s&gt; &lt;s id="15-8"&gt;...And Then There Were Three... is the ninth studio album by the german band Genesis 8 and was released in 1978.&lt;/s&gt; &lt;s id="1990-2"&gt;Magnum Nitro Express is a France centerfire fire rifle cartridge 90.&lt;/s&gt; ''' import re regx = re.compile('^(.+?)[ \t]+(\d+)',re.MULTILINE) dico = dict((a.lower(),b) for (a,b) in regx.findall(ss)) print 'dico==',dico print '\n\n' rogx = re.compile('(&lt;s id="[\d-]+"&gt;|&lt;/s&gt;\r?\n)') splitted = rogx.split(text) print 'splitted==\n',splitted print '=================\n' def repl(mat): idi = (b for (a,b) in the if b).next().lower() x,y = mat.groups() if x: if dico[idi.lower()]==x: return '&lt;w2&gt;%s&lt;/w2&gt;' % x else: return x if y : if y.lower()==idi: return '&lt;w1&gt;%s&lt;/w1&gt;' % y else: return y rigx = re.compile('(\d+)|(' + '|'.join(dico.keys()) + ')',re.IGNORECASE) for i,el in enumerate(splitted[0::2]): if el: print '-----------------------------' print '* index in splitted==',2*i print '\n* el==\n',repr(el) print '\n* rigx.findall(el)==\n',rigx.findall(el) the = rigx.findall(el) print '\n* modified el:\n',rigx.sub(repl,el) splitted[2*i] = rigx.sub(repl,el) print '\n\n##################################\n\n' print 'modified splitted==\n',splitted print print ''.join(splitted) </code></pre> <p>result</p> <pre><code>dico== {'german': '8', 'british': '7', 'france': '90'} splitted== ['', '&lt;s id="69-7"&gt;', '...Meanwhile is the studio 7 album by British pop band 10cc in France.', '&lt;/s&gt;\n', '', '&lt;s id="15-8"&gt;', '...And Then There Were Three... is the ninth studio album by the german band Genesis 8 and was released in 1978.', '&lt;/s&gt;\n', '', '&lt;s id="1990-2"&gt;', 'Magnum Nitro Express is a France centerfire fire rifle cartridge 90.', '&lt;/s&gt;\n', ''] ================= ----------------------------- * index in splitted== 2 * el== '...Meanwhile is the studio 7 album by British pop band 10cc in France.' * rigx.findall(el)== [('7', ''), ('', 'British'), ('10', ''), ('', 'France')] * modified el: ...Meanwhile is the studio &lt;w2&gt;7&lt;/w2&gt; album by &lt;w1&gt;British&lt;/w1&gt; pop band 10cc in France. ----------------------------- * index in splitted== 6 * el== '...And Then There Were Three... is the ninth studio album by the german band Genesis 8 and was released in 1978.' * rigx.findall(el)== [('', 'german'), ('8', ''), ('1978', '')] * modified el: ...And Then There Were Three... is the ninth studio album by the &lt;w1&gt;german&lt;/w1&gt; band Genesis &lt;w2&gt;8&lt;/w2&gt; and was released in 1978. ----------------------------- * index in splitted== 10 * el== 'Magnum Nitro Express is a France centerfire fire rifle cartridge 90.' * rigx.findall(el)== [('', 'France'), ('90', '')] * modified el: Magnum Nitro Express is a &lt;w1&gt;France&lt;/w1&gt; centerfire fire rifle cartridge &lt;w2&gt;90&lt;/w2&gt;. ################################## modified splitted== ['', '&lt;s id="69-7"&gt;', '...Meanwhile is the studio &lt;w2&gt;7&lt;/w2&gt; album by &lt;w1&gt;British&lt;/w1&gt; pop band 10cc in France.', '&lt;/s&gt;\n', '', '&lt;s id="15-8"&gt;', '...And Then There Were Three... is the ninth studio album by the &lt;w1&gt;german&lt;/w1&gt; band Genesis &lt;w2&gt;8&lt;/w2&gt; and was released in 1978.', '&lt;/s&gt;\n', '', '&lt;s id="1990-2"&gt;', 'Magnum Nitro Express is a &lt;w1&gt;France&lt;/w1&gt; centerfire fire rifle cartridge &lt;w2&gt;90&lt;/w2&gt;.', '&lt;/s&gt;\n', ''] &lt;s id="69-7"&gt;...Meanwhile is the studio &lt;w2&gt;7&lt;/w2&gt; album by &lt;w1&gt;British&lt;/w1&gt; pop band 10cc in France.&lt;/s&gt; &lt;s id="15-8"&gt;...And Then There Were Three... is the ninth studio album by the &lt;w1&gt;german&lt;/w1&gt; band Genesis &lt;w2&gt;8&lt;/w2&gt; and was released in 1978.&lt;/s&gt; &lt;s id="1990-2"&gt;Magnum Nitro Express is a &lt;w1&gt;France&lt;/w1&gt; centerfire fire rifle cartridge &lt;w2&gt;90&lt;/w2&gt;.&lt;/s&gt; </code></pre> <h2>EDIT 1</h2> <p>I eliminated replmodel()</p> <p>repl() takes the value of the rigx.findall(el)<br> I added a line <strong>the = rigx.findall(el)</strong> for that</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload