Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>You don't have to sort the data at all. A simple solution might be:</p> <pre><code>def row_grouper(reader): iterrows = iter(reader) current = [next(iterrows) for _ in range(5)] for next_row in iterrows: yield current current.pop(0) current.append(next_row) reader = csv.reader(open(filename)) for i, row_group in enumerate(row_grouper(reader)): if all(float(row[1]) &lt; 40 for row in row_group): print i, i+5 #i is the index of the first row in the group. break #stop processing other rows. </code></pre> <p>The <code>row_grouper</code> function is a generator that yields 5-element lists of consecutive rows. Every time it removes the first row of the group and adds the new row at the end.</p> <hr> <p>Instead of a plain <code>list</code> you can use a <code>deque</code> and replace the <code>pop(0)</code> in <code>row_grouper</code> with a <code>popleft()</code> call which is more efficient, although this doesn't matter much if the list has only 5 elements.</p> <p>Alternatively you can use martineau suggestion and use the <code>maxlen</code> keyword argument and avoid <code>pop</code>ing. This is about twice as fast as using a deque's popleft, which is about twice as fast as using the <code>list</code>'s <code>pop(0)</code>.</p> <hr> <p><strong>Edit:</strong> To check more than one condition you can modify use more than one <code>row_grouper</code> and use <code>itertools.tee</code> to obtain copies of the iterables.</p> <p>For example:</p> <pre><code>import itertools as it def check_condition(group, row_index, limit, found): if group is None or found: return False return all(float(row[row_index]) &lt; limit for row in group) f_iter, s_iter, t_iter = it.tee(iter(reader), 3) groups = row_grouper(f_iter, 10), row_grouper(s_iter, 5), row_grouper(t_iter, 25) found_first = found_second = found_third = False for index, (first, second, third) in enumerate(it.izip_longest(*groups)): if check_condition(first, 1, 40, found_first): #stuff found_first = True if check_condition(second, 3, 40, found_second): #stuff found_second = True if check_condition(third, 3, 40, found_third): # stuff found_third = True if found_first and found_second and found_third: #stop the code if we matched all the conditions once. break </code></pre> <p>The first part simply imports <code>itertools</code>(and assigns an "alias" <code>it</code> to avoid typing <code>itertools</code> every time).</p> <p>I've defined the <code>check_condition</code> function, since the conditions are getting more complicated and you don't want to repeat them over and over. As you can see the last line of <code>check_condition</code> is the same as the condition before: it checks if the current "row group" verifies the property. Since we plan to iterate over the file only once, and we cannot stop the loop when only one condition is met(since we'd miss the other conditions) we must use some flag that tells us if the condition on (e.g.) time was met before or not. As you can see in the <code>for</code> loop, we <code>break</code> out of the loop when all the conditions are met.</p> <p>Now, the line:</p> <pre><code>f_iter, s_iter, t_iter = it.tee(iter(reader), 3) </code></pre> <p>Creates an iterable over the rows of <code>reader</code> and makes 3 copies of it. This means that the loop:</p> <pre><code>for row in f_iter: print(row) </code></pre> <p>Will print all the rows of the file, just like doing <code>for row in reader</code>. Note however that <code>itertools.tee</code> allows us to obtain copies of the rows <em>without</em> reading the file more than once.</p> <p>Afterwards, we must pass these rows to the <code>row_grouper</code> in order to verify the conditions:</p> <pre><code>groups = row_grouper(f_iter, 10), row_grouper(s_iter, 5), row_grouper(t_iter, 25) </code></pre> <p>Finally we have to loop over the "row groups". To do this simultaneously we use <code>itertools.izip_longest</code> (renamed to <code>itertools.zip_longest</code> (without <code>i</code>) in python3). It works just like <code>zip</code>, creating pairs of elements (e.g. <code>zip([1, 2, 3], ["a", "b", "c"]) -&gt; [(1, "a"), (2, "b"), (3, "c")]</code>). The difference is that <code>izip_longest</code> <em>pads</em> the shorter iterables with <code>None</code>s. This assures that we check the conditions on all the possible groups(and that's also why <code>check_condition</code> has to check if <code>group</code> is <code>None</code>).</p> <p>To obtain the current row index we wrap everything in <code>enumerate</code>, just like before. Inside the <code>for</code> the code is pretty simple: you check the conditions using <code>check_condition</code> and, if the condition is met you do what you have to do <em>and</em> you have to set the flag for that condition(so that in the following loops the condition will always be <code>False</code>).</p> <p>(Note: I must say I did not test the code. I'll test it when I have a bit of time, anyway I hope I gave you some ideas. And check out the documentation for <code>itertools</code>).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload