Note that there are some explanatory texts on larger screens.

plurals
  1. POShuffling a large text file without/with group order maintained
    primarykey
    data
    text
    <p>Instead of making a script, it there a one liner to shuffle a large tab separated text file, based on the unique elements in the first column. That means, for each unique element in the first column, number of rows will be equal and be specified by the user.</p> <p>There are two output possibilities, maintaining the row order or randomized row order.</p> <p>Input :</p> <pre><code>chr1 3003204 3003454 * 37 + chr1 3003235 3003485 * 37 + chr1 3003148 3003152 * 37 - chr1 3003461 3003711 * 37 + chr11 71863609 71863647 * 37 + chr11 71864025 71864275 * 37 + chr11 71864058 71864308 * 37 - chr11 71864534 71864784 * 37 + chrY 90828920 90829170 * 23 - chrY 90829096 90829346 * 23 + chrY 90828924 90829174 * 23 - chrY 90828925 90829175 * 23 - </code></pre> <p>Output (1 row per category - defined by the user) Output1 (randomized - row order will change) :</p> <pre><code>chr1 3003235 3003485 * 37 + chr11 71863609 71863647 * 37 + chrY 90828925 90829175 * 23 - </code></pre> <p>Output1 (randomized - row order will be maintained) :</p> <pre><code>chr1 3003204 3003454 * 37 + chr11 71863609 71863647 * 37 + chrY 90828920 90829170 * 23 - </code></pre> <p>I tried using <code>sort -u</code> with <code>cut</code> on first column to fetch unique elements and then running a combination of <code>grep</code> and <code>head</code> for each element to generate the output file, which can be randomized using <code>shuf</code>, there might be a better solution as the file can be huge > 50 Million lines.</p> <p>Cheers</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload