Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to extract matching strings into a defaultdict(set)? Python
    text
    copied!<p>I have a textfile that has such lines (see below), where an english sentence is followed by a spanish sentence and the equivalent translation table delimited by "<code>{##}</code>". (if you know it it's the output for <code>giza-pp</code>)</p> <blockquote> <p>you have requested a debate on this subject in the course of the next few days , during this part-session . {##} sus señorías han solicitado un debate sobre el tema para los próximos días , en el curso de este período de sesiones . {##} 0-0 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 12-10 13-11 14-11 15-12 16-13 17-14 9-15 10-16 11-17 18-18 17-19 19-21 20-22</p> </blockquote> <p>The translation table is understood as such, <code>0-0 0-1</code> means that the 0th word in english (i.e. <code>you</code>) matches the 0th and 1st word in spanish (i.e. <code>sus señorías</code>)</p> <p>Let's say i want to know what is the translation of <code>course</code> in spanish from the sentence, normally i'll do it this way:</p> <pre><code>from collections import defaultdict eng, spa, trans = x.split(" {##} ") tt = defaultdict(set) for s,t in [i.split("-") for i in trans.split(" ")]: tt[s].add(t) query = 'course' for i in spa.split(" ")[tt[eng.index(query)]]: print i </code></pre> <p><strong>is there a simple way to do the above? may <code>regex</code>? <code>line.find()</code>?</strong></p> <p>After some tries i have to do this in order to cover many other issues like MWE and missing translations:</p> <pre><code>def getTranslation(gizaline,query): src, trg, trans = gizaline.split(" {##} ") tt = defaultdict(set) for s,t in [i.split("-") for i in trans.split(" ")]: tt[int(s)].add(int(t)) try: query_translated =[trg.split(" ")[i] for i in tt[src.split(" ").index(query)]] except ValueError: for i in src.split(" "): if "-"+query or query+"-" in i: query = i break query_translated =[trg.split(" ")[i] for i in tt[src.split(" ").index(query)]] if len(query_translated) &gt; 0: return ":".join(query_translated) else: return "#NULL" </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload