Note that there are some explanatory texts on larger screens.

plurals
  1. POScript to convert Huge Three column table into table
    primarykey
    data
    text
    <p>I have a set of data (CSV files) in the following 3 column format:</p> <pre><code>A, B, C 3277,4733,54.1 3278,4741,51.0 3278,4750,28.4 3278,4768,36.0 3278,4776,50.1 3278,4784,51.4 3279,4792,82.6 3279,4806,78.2 3279,4814,36.4 </code></pre> <p>And I need to get a three-way contingency table like: (sorry, this doesn't look completely good)</p> <pre><code>A /B 4733 4741 4750 4768 4776 4784 4792 4806 4814 3277 C 54.1 3278 51 28.4 36 50.1 51.4 3279 82.6 78.2 36.4 </code></pre> <p>Similarly to an excel "pivot table", OpenOffice data pilot, or R "table(x,y,z)" </p> <p>The problem is that my dataset is HUGE (more than 500,000 total rows, with about 400 different factors in A and B. (OOo, MSO and R limits prevent from achieving this)</p> <p>I am sure a Python script could be used to create such a table. both A and B are numbers (but can be treated as strings). </p> <p>Anyone has dealt with this? (pseudocode or code in C or Java is also welcomed ... but I prefer python as it is faster to implement :)</p> <p><strong>Edit:</strong> Almost have it, thanks to John Machin. The following Python script <em>almost</em> provides what I am looking for, however, when writing the output file I can see that the values in the "headers" I am writing (taken from the first row) do not correspond to the other rows. </p> <pre><code>from collections import defaultdict as dd d = dd(lambda: dd(float)) input = open("input.txt") output = open("output.txt","w") while 1: line = input.readline() if not line: break line = line.strip('\n').strip('\r') splitLine = line.split(',') if (len(splitLine) &lt;3): break d[splitLine[0]][splitLine[1]] = splitLine[2] output.write("\t") for k,v in d.items()[0][1].items(): output.write(str(k)+"\t") output.write("\n") for k,v in d.items(): output.write(k+"\t") for k2,v2 in v.items(): output.write(str(v2)+"\t") output.write("\n") </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload