Note that there are some explanatory texts on larger screens.

plurals
  1. POPerformance Bottleneck with python mapping data structure
    text
    copied!<p>I am facing a little performance problem with one of my data structures used for a bigger project in python.</p> <p>Basically, I am importing a tabular delimited file. Using the normal python open(...) file iterator I am splitting the lines with line.split("\t"). Now I want the actual value of a column be inserted in some sort of dictionary returning an ID for the value. And there it is getting slow:</p> <p>In general - the dictionary class looks like this:</p> <pre><code>class Dictionary(list): def getBitLength(self): if(len(self) == 0): return 0 else: return math.log(len(self), 2) def insertValue(self, value): self.append(value) return len(self) - 1 def getValueForValueId(self, valueId): return self[valueId] def getValueIdForValue(self, value): if(value in self): return self.index(value) else: return self.insertValue(value) </code></pre> <p>The basic idea was, that the valueId is the index of the value in the dictionary list.</p> <p>Profiling the program tells me that more than 50% are spend on getValueIdForValue(...).</p> <pre><code>1566562 function calls in 23.218 seconds Ordered by: cumulative time List reduced from 93 to 10 due to restriction &lt;10&gt; 240000 13.341 0.000 16.953 0.000 Dictionary.py:22(getValueIdForValue) 206997 3.196 0.000 3.196 0.000 :0(index) </code></pre> <p>The problem is, that this is just a small test. In real application environment this function would be called several million times which would increase the runtime for this to a large extend.</p> <p>Of course I could inherit from python dict, but than the performance problem is quite similar since I need to get key of a given value (in case that the value already has been inserted to the dictionary).</p> <p>Since I am not the Python Pro until now, can you guys give me any tips how to make this a bit more efficient?</p> <p>Best &amp; thanks for the help,</p> <p>n3otec</p> <p>===</p> <p>Thanks guys!</p> <p>Performance of bidict is much better:</p> <pre><code> 240000 2.458 0.000 8.546 0.000 Dictionary.py:34(getValueIdForValue) 230990 1.678 0.000 5.134 0.000 Dictionary.py:27(insertValue) </code></pre> <p>Best, n3otec</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload