Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>The first to do in these cases is always to profile:</p> <pre><code>15147/297 0.032 0.000 0.041 0.000 tree.py:102(__eq__) 15400/200 0.031 0.000 0.106 0.001 tree.py:399(convert) 1 0.023 0.023 0.129 0.129 plcfrs_cython.pyx:52(parse) 6701/1143 0.022 0.000 0.043 0.000 heapdict.py:45(_min_heapify) 18212 0.017 0.000 0.023 0.000 plcfrs_cython.pyx:38(__richcmp__) 10975/10875 0.017 0.000 0.035 0.000 tree.py:75(__init__) 5772 0.016 0.000 0.050 0.000 tree.py:665(__init__) 960 0.016 0.000 0.025 0.000 plcfrs_cython.pyx:118(deduced_from) 46938 0.014 0.000 0.014 0.000 tree.py:708(_get_node) 25220/2190 0.014 0.000 0.016 0.000 tree.py:231(subtrees) 10975 0.013 0.000 0.023 0.000 tree.py:60(__new__) 49441 0.013 0.000 0.013 0.000 {isinstance} 16748 0.008 0.000 0.015 0.000 {hasattr} </code></pre> <p>The First thing I noticed is that very few functions are from the cython module itself. Most of them come from the tree.py module and maybe is that the bottleneck.</p> <p>Focusing on the cython side I see the <strong>richcmp</strong> function:</p> <p>we can optimize it simply by adding the type of the values in the method declaration</p> <pre><code>def __richcmp__(ChartItem self, ChartItem other, int op): .... </code></pre> <p>This brings down the value </p> <pre><code>ncalls tottime percall cumtime percall filename:lineno(function) .... 18212 0.011 0.000 0.015 0.000 plcfrs_cython.pyx:38(__richcmp__) </code></pre> <p>Adding the elif syntax instead of the single if will enable <a href="http://wiki.cython.org/enhancements/switch" rel="nofollow">the switch optimization of cython</a></p> <pre><code> if op == 0: return self.label &lt; other.label or self.vec &lt; other.vec elif op == 1: return self.label &lt;= other.label or self.vec &lt;= other.vec elif op == 2: return self.label == other.label and self.vec == other.vec elif op == 3: return self.label != other.label or self.vec != other.vec elif op == 4: return self.label &gt; other.label or self.vec &gt; other.vec elif op == 5: return self.label &gt;= other.label or self.vec &gt;= other.vec </code></pre> <p>obtaining:</p> <pre><code>17963 0.002 0.000 0.002 0.000 plcfrs_cython.pyx:38(__richcmp__) </code></pre> <p>trying to figure out where that tree.py:399 convert comes from I found out that this function inside dopg.py takes all that time</p> <pre><code> def removeids(tree): """ remove unique IDs introduced by the Goodman reduction """ result = Tree.convert(tree) for a in result.subtrees(lambda t: '@' in t.node): a.node = a.node.rsplit('@', 1)[0] if isinstance(tree, ImmutableTree): return result.freeze() return result </code></pre> <p>Now I am not sure if each node in the tree is a ChartItem and if the <strong>getitem</strong> value is being used somewhere else but adding this changes :</p> <pre><code>cdef class ChartItem: cdef public str label cdef public str root cdef public long vec cdef int _hash __slots__ = ("label", "vec", "_hash") def __init__(ChartItem self, label, int vec): self.label = intern(label) #.rsplit('@', 1)[0]) self.root = intern(label.rsplit('@', 1)[0]) self.vec = vec self._hash = hash((self.label, self.vec)) def __hash__(self): return self._hash def __richcmp__(ChartItem self, ChartItem other, int op): if op == 0: return self.label &lt; other.label or self.vec &lt; other.vec elif op == 1: return self.label &lt;= other.label or self.vec &lt;= other.vec elif op == 2: return self.label == other.label and self.vec == other.vec elif op == 3: return self.label != other.label or self.vec != other.vec elif op == 4: return self.label &gt; other.label or self.vec &gt; other.vec elif op == 5: return self.label &gt;= other.label or self.vec &gt;= other.vec def __getitem__(ChartItem self, int n): if n == 0: return self.root elif n == 1: return self.vec def __repr__(self): #would need bitlen for proper padding return "%s[%s]" % (self.label, bin(self.vec)[2:][::-1]) </code></pre> <p>and inside of mostprobableparse:</p> <pre><code>from libc cimport pow def mostprobableparse... ... cdef dict parsetrees = &lt;dict&gt;defaultdict(float) cdef float prob m = 0 for n,(a,prob) in enumerate(derivations): parsetrees[a] += pow(e,prob) m += 1 </code></pre> <p>I get:</p> <pre><code> 189345 function calls (173785 primitive calls) in 0.162 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 6701/1143 0.025 0.000 0.037 0.000 heapdict.py:45(_min_heapify) 1 0.023 0.023 0.120 0.120 plcfrs_cython.pyx:54(parse) 960 0.018 0.000 0.030 0.000 plcfrs_cython.pyx:122(deduced_from) 5190/198 0.011 0.000 0.015 0.000 tree.py:102(__eq__) 6619 0.006 0.000 0.006 0.000 heapdict.py:67(_swap) 9678 0.006 0.000 0.008 0.000 plcfrs_cython.pyx:137(concat) </code></pre> <p>so the next steps are to optimize heapify and deduced_from</p> <p>deduce_from can be optimized a bit more:</p> <pre><code>cdef inline deduced_from(ChartItem Ih, double x, pyCx, pyunary, pylbinary, pyrbinary, int bitlen): cdef str I = Ih.label cdef int Ir = Ih.vec cdef list result = [] cdef dict Cx = &lt;dict&gt;pyCx cdef dict unary = &lt;dict&gt;pyunary cdef dict lbinary = &lt;dict&gt;pylbinary cdef dict rbinary = &lt;dict&gt;pyrbinary cdef ChartItem Ilh cdef double z cdef double y cdef ChartItem I1h for rule, z in unary[I]: result.append((ChartItem(rule[0][0], Ir), ((x+z,z), (Ih,)))) for rule, z in lbinary[I]: for I1h, y in Cx[rule[0][2]].items(): if concat(rule[1], Ir, I1h.vec, bitlen): result.append((ChartItem(rule[0][0], Ir ^ I1h.vec), ((x+y+z, z), (Ih, I1h)))) for rule, z in rbinary[I]: for I1h, y in Cx[rule[0][1]].items(): if concat(rule[1], I1h.vec, Ir, bitlen): result.append((ChartItem(rule[0][0], I1h.vec ^ Ir), ((x+y+z, z), (I1h, Ih)))) return result </code></pre> <p>I will stop here although I am confident that we can keep optimizing as more insight is acquired on the problem.</p> <p>A series of unittest would be useful to assert that each optimization don't introduce any subtle error.</p> <p>A side note, try to use spaces instead of tabs.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload