Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Having such a monolithic function that takes a filename instead of an open file is by itself not very Pythonic. You are trying to implement a stream processor here (<code>file stream -&gt; line stream -&gt; CSV record stream -&gt; [transformator -&gt;] data stream</code>), so using a generator is actually a good idea. I'd slightly refactor this to be a bit more modular:</p> <pre><code>import csv from collections import namedtuple def csv_rows(infile, header): reader = csv.reader(infile, delimiter="\t") if header: next(reader) return reader def data_sets(infile, header): gene_data = namedtuple("Data", 'id, name, q, start, end, sym') for row in csv_rows(infile, header): yield gene_data(*row) def read_file_type1(infile, header=True): # for this file type, we only need to pass the caller the raw # data objects return data_sets(infile, header) def read_file_type2(infile, header=True): # for this file type, we have to pre-process the data sets # before yielding them. A good way to express this is using a # generator expression (we could also add a filtering condition here) return (transform_data_set(x) for x in data_sets(infile, header)) # Usage sample: with open("...", "r") as f: for obj in read_file_type1(f): print obj </code></pre> <p>As you can see, we have to pass the <code>header</code> argument all the way through the function chain. This is a strong hint that an object-oriented approach would be appropriate here. The fact that we obviously face a hierarchical type structure here (basic data file, type1, type2) supports this.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload