Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here are three possibilities:</p> <pre><code>foo = """ this is a multi-line string. """ def f1(foo=foo): return iter(foo.splitlines()) def f2(foo=foo): retval = '' for char in foo: retval += char if not char == '\n' else '' if char == '\n': yield retval retval = '' if retval: yield retval def f3(foo=foo): prevnl = -1 while True: nextnl = foo.find('\n', prevnl + 1) if nextnl &lt; 0: break yield foo[prevnl + 1:nextnl] prevnl = nextnl if __name__ == '__main__': for f in f1, f2, f3: print list(f()) </code></pre> <p>Running this as the main script confirms the three functions are equivalent. With <code>timeit</code> (and a <code>* 100</code> for <code>foo</code> to get substantial strings for more precise measurement):</p> <pre><code>$ python -mtimeit -s'import asp' 'list(asp.f3())' 1000 loops, best of 3: 370 usec per loop $ python -mtimeit -s'import asp' 'list(asp.f2())' 1000 loops, best of 3: 1.36 msec per loop $ python -mtimeit -s'import asp' 'list(asp.f1())' 10000 loops, best of 3: 61.5 usec per loop </code></pre> <p>Note we need the <code>list()</code> call to ensure the iterators are traversed, not just built.</p> <p>IOW, the naive implementation is so much faster it isn't even funny: 6 times faster than my attempt with <code>find</code> calls, which in turn is 4 times faster than a lower-level approach.</p> <p>Lessons to retain: measurement is always a good thing (but must be accurate); string methods like <code>splitlines</code> are implemented in very fast ways; putting strings together by programming at a very low level (esp. by loops of <code>+=</code> of very small pieces) can be quite slow.</p> <p><strong>Edit</strong>: added @Jacob's proposal, slightly modified to give the same results as the others (trailing blanks on a line are kept), i.e.:</p> <pre><code>from cStringIO import StringIO def f4(foo=foo): stri = StringIO(foo) while True: nl = stri.readline() if nl != '': yield nl.strip('\n') else: raise StopIteration </code></pre> <p>Measuring gives:</p> <pre><code>$ python -mtimeit -s'import asp' 'list(asp.f4())' 1000 loops, best of 3: 406 usec per loop </code></pre> <p>not quite as good as the <code>.find</code> based approach -- still, worth keeping in mind because it might be less prone to small off-by-one bugs (any loop where you see occurrences of +1 and -1, like my <code>f3</code> above, should automatically trigger off-by-one suspicions -- and so should many loops which lack such tweaks and should have them -- though I believe my code is also right since I was able to check its output with other functions').</p> <p>But the split-based approach still rules.</p> <p>An aside: possibly better style for <code>f4</code> would be:</p> <pre><code>from cStringIO import StringIO def f4(foo=foo): stri = StringIO(foo) while True: nl = stri.readline() if nl == '': break yield nl.strip('\n') </code></pre> <p>at least, it's a bit less verbose. The need to strip trailing <code>\n</code>s unfortunately prohibits the clearer and faster replacement of the <code>while</code> loop with <code>return iter(stri)</code> (the <code>iter</code> part whereof is redundant in modern versions of Python, I believe since 2.3 or 2.4, but it's also innocuous). Maybe worth trying, also:</p> <pre><code> return itertools.imap(lambda s: s.strip('\n'), stri) </code></pre> <p>or variations thereof -- but I'm stopping here since it's pretty much a theoretical exercise wrt the <code>strip</code> based, simplest and fastest, one.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload