StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>You need to profile your code to find out where the time is being spent.</p> <p>That doesn't necessarily mean using python's profiler</p> <p>For example you can just try parsing the same csv string 1000000 times with different methods. Choose the fastest method - divide by 1000000 now you know how much CPU time it takes to parse a string</p> <p>Try to break the program into parts and work out how what resources are really required by each part.</p> <p>The parts that need the most CPU per input line are your bottle necks</p> <p>On my computer, the program below outputs this</p> <pre><code>ChanVal0 took 0.210402965546 seconds ChanVal1 took 0.350302934647 seconds ChanVal2 took 0.558166980743 seconds ChanVal3 took 0.691503047943 seconds </code></pre> <p>So you see that about half the time there is taken up by <code>parseFromCsv</code>. But also that quite a lot of time is taken extracting the values and storing them in the class.</p> <p>If the class isn't used right away it might be faster to store the raw data and use properties to parse the csvString on demand.</p> <pre><code>from time import time import re class ChanVal0(object): def __init__(self, csvString=None,**kwargs): self.csvString=csvString for key in kwargs: setattr(self,key,kwargs[key]) class ChanVal1(object): def __init__(self, csvString=None,**kwargs): if csvString is not None: self.parseFromCsv(csvString) for key in kwargs: setattr(self,key,kwargs[key]) def parseFromCsv(self, csvString): self.lst = csvString.split(',') class ChanVal2(object): def __init__(self, csvString=None,**kwargs): if csvString is not None: self.parseFromCsv(csvString) for key in kwargs: setattr(self,key,kwargs[key]) def parseFromCsv(self, csvString): lst = csvString.split(',') self.eventTime=lst[1] self.eventTimeExact=long(lst[2]) self.other_clock=lst[3] class ChanVal3(object): splitter=re.compile("[^,]*,(?P<eventTime>[^,]*),(?P<eventTimeExact>[^,]*),(?P<other_clock>[^,]*)") def __init__(self, csvString=None,**kwargs): if csvString is not None: self.parseFromCsv(csvString) self.__dict__.update(kwargs) def parseFromCsv(self, csvString): self.__dict__.update(self.splitter.match(csvString).groupdict()) s="chan,2007-07-13T23:24:40.143,0,0188878425-079,0,0,True,S-4001,UNSIGNED_INT,name1,module1" RUNS=100000 for cls in ChanVal0, ChanVal1, ChanVal2, ChanVal3: start_time = time() for i in xrange(RUNS): cls(s) print "%s took %s seconds"%(cls.__name__, time()-start_time) </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload