Note that there are some explanatory texts on larger screens.

plurals
  1. POTransform raw data into relational data
    primarykey
    data
    text
    <h2>Intro</h2> <p>I've been given a messy excel dump straight into a table. Now I need to turn that mess into something useful. The dump has duplicates and inconsistencies... good times!</p> <p>I've been striking out on every approach so far :( - Hope you can help me out.</p> <p>Given this example data set:</p> <pre><code>ExcelDump +----+------+------+------+ | ID | Col1 | Col2 | Col3 | +----+------+------+------+ | 1 | | | C | | 1 | | B | C | | 1 | A | B | D | | 1 | E | B | C | | 2 | A | B | C | | 2 | A | B | C | | 3 | A | B | C | | 3 | A | B | F | | 4 | A | B | C | | 4 | G | B | C | +----+------+------+------+ </code></pre> <p>One possible result could be:</p> <pre><code>OutputTable +----+------+------+------+ | ID | Col1 | Col2 | Col3 | +----+------+------+------+ | 1 | A | B | C | | 2 | A | B | C | | 3 | A | B | C | | 4 | A | B | C | +----+------+------+------+ </code></pre> <p>Nice and neat. Unique ID key and data merged together in a way that makes sense.</p> <h2>How to choose which data is correct?</h2> <p>You've probably noticed that another possible result could be:</p> <pre><code>+----+------+------+------+ | ID | Col1 | Col2 | Col3 | +----+------+------+------+ | 1 | E | B | C | | 2 | A | B | C | | 3 | A | B | F | | 4 | G | B | C | +----+------+------+------+ </code></pre> <p>This is where it gets complicated. I want to be able to choose the set that makes the most sense based on some conditions I can manipulate.</p> <p>For instance I want to setup a condition that says: <strong>"Choose the most (non-null) common value, if no most common found take the <em>first</em> value found that is not null."</strong> This condition should be applied to the selection of grouped by IDs. The result of that condition would be:</p> <pre><code>+----+------+------+------+ | ID | Col1 | Col2 | Col3 | +----+------+------+------+ | 1 | A | B | C | | 2 | A | B | C | | 3 | A | B | C | | 4 | A | B | C | +----+------+------+------+ </code></pre> <p>If I later find out that that assumption was wrong and it instead should be: <strong>"Choose the most (non-null) common value, if no most common found take the <em>last</em> value found that is not null."</strong></p> <pre><code>+----+------+------+------+ | ID | Col1 | Col2 | Col3 | +----+------+------+------+ | 1 | E | B | C | | 2 | A | B | C | | 3 | A | B | F | | 4 | G | B | C | +----+------+------+------+ </code></pre> <p>So basically I want to select values based a set of conditions on each group of IDs.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload