Note that there are some explanatory texts on larger screens.

plurals
  1. POBuilding sorted data table using hash
    primarykey
    data
    text
    <p>I have built several scripts to organize data output from test equipment, but am hitting a mental roadblock with this one.</p> <p>Test equipment monitors four types of input (Data1, Data2, Data3, Data4) from multiple subjects (with identifiers ID1, ID2, etc.), and records each in intervals with a date and time stamp. CSV file dumped by the equipment is organized like this:</p> <pre><code>Start,Date,Time0 Subject,ID1,ID2,[...],ID# Date,Time1 Data1,aa1,aa2,[...],aa# Data2,ba1,ba2,[...],ba# Data3,ca1,ca2,[...],ca# Data4,da1,da2,[...],da# Date,Time2 Data1,ab1,ab2,[...],ab# Data2,bb1,bb2,[...],bb# Data3,cb1,cb2,[...],cb# Data4,db1,db2,[...],db# </code></pre> <p>...and so on. </p> <p>"Start" identifies this line as the beginning of the data; "Subject" identifies the line as the line containing subject IDs; "Data1"-"Data4" identify the line as those containing the data for that datatype in the specific time interval indicated by the preceding date and time.</p> <p>Output data is thus split into multiple blocks, which is really an unfortunate choice on the part of the equipment manufacturer, especially as data is collected every few minutes over several days or weeks. To analyze the data without having to manually select every 6th line, we need to group all data types into blocks, like this:</p> <pre><code>Data1,Subject,ID1,ID2,[...],ID# Date,Time1,aa1,aa2,[...],aa# Date,Time2,ab1,ab2,[...],ab# ... Data2,Subject,ID1,ID2,[...],ID# Date,Time1,ba1,ba2,[...],ba# Date,Time2,bb1,bb2,[...],bb# ... </code></pre> <p>The goal is to have each of the four data types in separate blocks, so that timecourse data for any given subject (ID1 through ID#) will be in a single column, with date and time as the initial columns. ("DataX" and "Subject" in the above are simply used as column headers.)</p> <p>Currently I am doing this by putting each line into a separate array. This was a quick-and-dirty way of getting things done; the script grabs the time and date, and pushes the ID line into each of four arrays (one for each data type), then proceeds to add each data line sequentially based on the data type. Output just prints each array line by line, adds a blank line, then prints the next array. This works, but ideally I would like to sort the data columns across by subject ID, and then print out the data, without losing the vertical sort by date and timestamp. (Because the data is already vertically sorted I do not currently have a sort function on the arrays before printing.)</p> <p>What's the simplest way to do this? Mentally I am having trouble trying to parse how to associate data in Row Y, Column X with the subject ID in Column X in the CSV file. Every other data output file I have used either keeps subject ID as the first item in each line or has one file per subject, which makes it easier.</p> <p>Note: Because time/date are on their own line, I am using a variable for each; if the script detects a line containing a new time and/or date, it updates the variable value.</p> <p>Edit -- I incorporated some of Borodin's suggestions (leaving FH handling by line rather than by paragraph). I have data from subject line pulled into an array (@ids), and am pushing data rows into a hash using date/time and ID as keys:</p> <pre><code>my ($datatype, @fields) = @line; push @keys, $datatype unless exists $data{$datatype}; my $datetime = "$date\,$time"; push @timestamps, $datetime unless exists $data{$datetime}; for my $i ( 0 .. $#fields) { push @{$data{$datetime}{$ids[$i]}}=&gt;$fields[$i] }; </code></pre> <p>I am also dropping the date-time pairs into a second array to maintain order (@timestamps). Problem at this point is that I am having issues printing the values back out. Currently trying:</p> <pre><code> foreach my $date (keys %data) { print OUT $date; foreach my $id (@ids) { foreach my $s (keys %{$data{$date}}) { if ( exists($data{$date}{$id}) ) { print OUT ",", $data{$date}{$id} } else { print OUT ","; } } } print OUT "\n"; # close printing on a given date } </code></pre> <p>Keep getting garbage output (printing the hash reference, not the actual value!). Dumper output looks like this:</p> <pre><code>$VAR1 = { 'date,time' =&gt; [ 'ID1' =&gt; [ '0.00' ] 'ID2' =&gt; [ '0.12', ] 'ID3' =&gt; [ '0.17', ] 'ID4' =&gt; [ '0.22', ] ] } }; </code></pre> <p>and the printed output is like this:</p> <pre><code>date,time,ARRAY(0x7f91c1030f60),ARRAY(0x7f91c1030f60),ARRAY(0x7f91c1030f60),ARRAY(0x7f91c1030f60) </code></pre> <p>Sorry the examples so far have been causing issues in interpretation. There is a lot of excess data and text in the input files, I only included a highly simplified version of the portions I am trying to extract and sort.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload