StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POBuilding sorted data table using hash
primarykey
Id
16197710
data
AcceptedAnswerId
16198823
AnswerCount
2
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-04-24T16:48:33.673
FavoriteCount
0
LastActivityDate
2016-10-22T11:18:33.297
LastEditDate
2016-10-22T11:18:33.297
LastEditorUserId
4370109
OwnerUserId
825160
ParentId
0
PostTypeId
1
Score
0
ViewCount
616
LastEditorDisplayName
text
Body
I have built several scripts to organize data output from test equipment, but am hitting a mental roadblock with this one. Test equipment monitors four types of input (Data1, Data2, Data3, Data4) from multiple subjects (with identifiers ID1, ID2, etc.), and records each in intervals with a date and time stamp. CSV file dumped by the equipment is organized like this: <pre><code>Start,Date,Time0 Subject,ID1,ID2,[...],ID# Date,Time1 Data1,aa1,aa2,[...],aa# Data2,ba1,ba2,[...],ba# Data3,ca1,ca2,[...],ca# Data4,da1,da2,[...],da# Date,Time2 Data1,ab1,ab2,[...],ab# Data2,bb1,bb2,[...],bb# Data3,cb1,cb2,[...],cb# Data4,db1,db2,[...],db# </code></pre> ...and so on. "Start" identifies this line as the beginning of the data; "Subject" identifies the line as the line containing subject IDs; "Data1"-"Data4" identify the line as those containing the data for that datatype in the specific time interval indicated by the preceding date and time. Output data is thus split into multiple blocks, which is really an unfortunate choice on the part of the equipment manufacturer, especially as data is collected every few minutes over several days or weeks. To analyze the data without having to manually select every 6th line, we need to group all data types into blocks, like this: <pre><code>Data1,Subject,ID1,ID2,[...],ID# Date,Time1,aa1,aa2,[...],aa# Date,Time2,ab1,ab2,[...],ab# ... Data2,Subject,ID1,ID2,[...],ID# Date,Time1,ba1,ba2,[...],ba# Date,Time2,bb1,bb2,[...],bb# ... </code></pre> The goal is to have each of the four data types in separate blocks, so that timecourse data for any given subject (ID1 through ID#) will be in a single column, with date and time as the initial columns. ("DataX" and "Subject" in the above are simply used as column headers.) Currently I am doing this by putting each line into a separate array. This was a quick-and-dirty way of getting things done; the script grabs the time and date, and pushes the ID line into each of four arrays (one for each data type), then proceeds to add each data line sequentially based on the data type. Output just prints each array line by line, adds a blank line, then prints the next array. This works, but ideally I would like to sort the data columns across by subject ID, and then print out the data, without losing the vertical sort by date and timestamp. (Because the data is already vertically sorted I do not currently have a sort function on the arrays before printing.) What's the simplest way to do this? Mentally I am having trouble trying to parse how to associate data in Row Y, Column X with the subject ID in Column X in the CSV file. Every other data output file I have used either keeps subject ID as the first item in each line or has one file per subject, which makes it easier. Note: Because time/date are on their own line, I am using a variable for each; if the script detects a line containing a new time and/or date, it updates the variable value. Edit -- I incorporated some of Borodin's suggestions (leaving FH handling by line rather than by paragraph). I have data from subject line pulled into an array (@ids), and am pushing data rows into a hash using date/time and ID as keys: <pre><code>my ($datatype, @fields) = @line; push @keys, $datatype unless exists $data{$datatype}; my $datetime = "$date\,$time"; push @timestamps, $datetime unless exists $data{$datetime}; for my $i ( 0 .. $#fields) { push @{$data{$datetime}{$ids[$i]}}=>$fields[$i] }; </code></pre> I am also dropping the date-time pairs into a second array to maintain order (@timestamps). Problem at this point is that I am having issues printing the values back out. Currently trying: <pre><code> foreach my $date (keys %data) { print OUT $date; foreach my $id (@ids) { foreach my $s (keys %{$data{$date}}) { if ( exists($data{$date}{$id}) ) { print OUT ",", $data{$date}{$id} } else { print OUT ","; } } } print OUT "\n"; # close printing on a given date } </code></pre> Keep getting garbage output (printing the hash reference, not the actual value!). Dumper output looks like this: <pre><code>$VAR1 = { 'date,time' => [ 'ID1' => [ '0.00' ] 'ID2' => [ '0.12', ] 'ID3' => [ '0.17', ] 'ID4' => [ '0.22', ] ] } }; </code></pre> and the printed output is like this: <pre><code>date,time,ARRAY(0x7f91c1030f60),ARRAY(0x7f91c1030f60),ARRAY(0x7f91c1030f60),ARRAY(0x7f91c1030f60) </code></pre> Sorry the examples so far have been causing issues in interpretation. There is a lot of excess data and text in the input files, I only included a highly simplified version of the portions I am trying to extract and sort.
Tags
<perl><sorting><hash>
Title
Building sorted data table using hash
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USdr.nixon
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. COYour description is very unclear. Is there only one `Subject` for the whole file? Why do you want it replicated four times? Which fields can be used as keys? e.g. is the string `Data1` the same each time? Is `Date` the actual date or is it the string `'Date'`? It would help to see some real data if possible.
 singulars
 PostPostId
 POBuilding sorted data table using hash
 UserUserId
 USBorodin
2. CO"Subject" is the first item in the line in the data file. My current script uses this to recognize that this is the line containing the list of IDs. "Date" and "Time" are placeholders in the example for the actual date and timestamps. The strings for each of the four data types are the same in each block of data, just as presented above using the placeholders "Data1" through "Data4". Subject IDs above were represented by "ID1", "ID2", etc. - will edit this for clarity.
 singulars
 PostPostId
 POBuilding sorted data table using hash
 UserUserId
 USdr.nixon

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.