Note that there are some explanatory texts on larger screens.

plurals
  1. POConsolidating a data table in Scala
    primarykey
    data
    text
    <p>I am working on a small data analysis tool, and practicing/learning Scala in the process. However I got stuck at a small problem. </p> <p>Assume data of type: </p> <pre><code>X Gr1 x_11 ... x_1n X Gr2 x_21 ... x_2n .. X GrK x_k1 ... x_kn Y Gr1 y_11 ... y_1n Y Gr3 y_31 ... y_3n .. Y Gr(K-1) ... </code></pre> <p>Here I have entries (X,Y...) that may or may not exist in up to K groups, with a series of values for each group. What I want to do is pretty simple (in theory), I would like to consolidate the rows that belong to the same "entity" in different groups. so instead of multiple lines that start with <code>X</code>, I want to have one row with all values from <code>x_11</code> to <code>x_kn</code> in columns. </p> <p>What makes things complicated however is that not all entities exist in all groups. So wherever there's "missing data" I would like to pad with for instance zeroes, or some string that denotes a missing value. So if I have (X,Y,Z) in up to 3 groups, the type I table I want to have is as follows:</p> <pre><code>X x_11 x_12 x_21 x_22 x_31 x_32 Y y_11 y_12 N/A N/A y_31 y_32 Z N/A N/A z_21 z_22 N/A N/A </code></pre> <p>I have been stuck trying to figure this out, is there a smart way to use List functions to solve this? </p> <hr> <p>I wrote this simple loop:</p> <pre><code>for { (id, hitlist) &lt;- hits.groupBy(_.acc) h &lt;- hitlist } println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t")) </code></pre> <p>to able to generate the tables that look like the example above. Note that, my original data is of a different format and layout,but that has little to do with the problem at hand, thus I have skipped all steps regarding parsing. I should be able to use <code>groupBy</code> in a better way that actually solves this for me, but I can't seem to get there. </p> <p>Then I modified my loop mapping the <code>hits</code> to <code>ratios</code> and appending them to one another:</p> <pre><code>for ((id, hitlist) &lt;- hits.groupBy(_.acc)){ val l = hitlist.map(_.ratios).foldRight(List[Double]()){ (l1: List[Double], l2: List[Double]) =&gt; l1 ::: l2 } println(id + "\t" + l.mkString("\t")) //println(id + "\t" + h.sampleId + "\t" + h.ratios.mkString("\t")) } </code></pre> <p>That gets me one step closer but still no cigar! Instead of a fully padded "matrix" I get a jagged table. Taking the example above:</p> <pre><code>X x_11 x_12 x_21 x_22 x_31 x_32 Y y_11 y_12 y_31 y_32 Z z_21 z_22 </code></pre> <p>Any ideas as to how I can pad the table so that values from respective groups are aligned with one another? I should be able to use <code>_.sampleId</code>, which holds the "group membersip" for each "hit", but I am not sure how exactly. ´hits´ is a List of type <code>Hit</code> which is practically a wrapper for each row, giving convenience methods for getting individual values, so essentially a tuple which have "named indices" (such as <code>.acc</code>, <code>.sampleId</code>..) </p> <p>(I would like to solve this problem without hardcoding the number of groups, as it might change from case to case)</p> <p>Thanks!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload