Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I iterate over a data file without code duplication in python?
    primarykey
    data
    text
    <p>I want to write a script to process some data files. The data files are just ascii text with columns of data, here is a simple example...</p> <p>The first column is an ID number, in this case from 1 to 3. The second column is a value of interest. (The actual files I'm using have many more IDs and values, but let's keep it simple here).</p> <p>data.txt contents:</p> <pre><code>1 5 1 4 1 10 1 19 2 15 2 18 2 20 2 21 3 50 3 52 3 55 3 70 </code></pre> <p>I want to iterate over the data and extract the values for each ID, and process them, i.e. get all values for ID 1 and do something with them, then get all values for ID 2 etc.</p> <p>So I can write this in python.</p> <pre><code>#!/usr/bin/env python def processValues(values): print "Will do something with data here: ", values f = open('data.txt', 'r') datalines = f.readlines() f.close() currentID = 0 first = True for line in datalines: fields = line.split() # if we've moved onto a new ID, # then process the values we've collected so far if (fields[0] != currentID): # but if this is our first iteration, then # we just need to initialise our ID variable if (not first): processValues(values) # do something useful currentID = fields[0] values = [] first = False values.append(fields[1]) processValues(values) # do something with the last values </code></pre> <p>The problem I have is that <code>processValues()</code> must be called again at the end. So this requires code duplication, and means that I might one day write a script like this and forget to put the extra <code>processValues()</code> at the end, and therefore miss the last ID. It also requires storing whether it is our 'first' iteration, which is annoying.</p> <p>Is there anyway to do this without having two function calls to <code>processValues()</code> (one inside the loop for each new ID, one after the loop for the last ID)?</p> <p>The only way I can think of is by storing the line number and checking in the loop if we're at the last line. But it seems that removes the point of the 'foreach' style processing where we store the line itself, and not the index or the total number of lines. This would also apply to other scripting languages like perl, where it would be common to iterate over lines with <code>while(&lt;FILE&gt;)</code> and not have an idea of the number of lines remaining. Is it always necessary to write the function call again at the end?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload