Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You need an <a href="http://en.wikipedia.org/wiki/Intersection_%28set_theory%29" rel="noreferrer"><em>intersection</em></a> of the two files: the lines from file1 and file2 which share some criteria. Consider this through a set theory perspective: you have two sets with some elements in common, and you need a new set with those elements. Well, there's more to it than that, because the lines aren't really equal...</p> <p>So, let's say you read file1, and that's of type <code>List[Input1]</code>. We could code it like this, without getting into any details of what <code>Input1</code> is:</p> <pre><code>case class Input1(line: String) val f1: List[Input1] = (Source fromFile "file1.csv" getLines () map Input1).toList </code></pre> <p>We can do the same thing for file2 and <code>List[Input2]</code>:</p> <pre><code>case class Input2(line: String) val f2: List[Input2] = (Source fromFile "file2.csv" getLines () map Input2).toList </code></pre> <p>You might be wondering why I created two different classes if they have the exact same definition. Well, if you were reading structured data, you <em>would</em> have two different types, so let's see how to handle that more complex case.</p> <p>Ok, so how do we match them, since <code>Input1</code> and <code>Input2</code> are different types? Well, the lines are matched by keys, which, according to your code, are the first column in each. So let's create a class <code>Key</code>, and conversions <code>Input1 =&gt; Key</code> and <code>Input2 =&gt; Key</code>:</p> <pre><code>case class Key(key: String) def Input1IsKey(input: Input1): Key = Key(input.line split "," head) // using regex would be better def Input2IsKey(input: Input2): Key = Key(input.line split "," head) </code></pre> <p>Ok, now that we can produce a common <code>Key</code> from <code>Input1</code> and <code>Input2</code>, let's get the intersection of them:</p> <pre><code>val intersection = (f1 map Input1IsKey).toSet intersect (f2 map Input2IsKey).toSet </code></pre> <p>So we can build the intersection of lines we want, but we don't have the lines! The problem is that, for each key, we need to know from which line it came. Consider that we have a set of keys, and for each key we want to keep track of a value -- that's exactly what a <code>Map</code> is! So we can build this:</p> <pre><code>val m1 = (f1 map (input =&gt; Input1IsKey(input) -&gt; input)).toMap val m2 = (f2 map (input =&gt; Input2IsKey(input) -&gt; input)).toMap </code></pre> <p>So the output can be produced like this:</p> <pre><code>val output = intersection map (key =&gt; m1(key).line + ", " + m2(key).line) </code></pre> <p>All you have to do now is output that.</p> <p>Let's consider some improvements on this code. First, note that the output produced above repeats the key -- that's exactly what your code does, but not what you want in the example. Let's change, then, <code>Input1</code> and <code>Input2</code> to split the key from the rest of the args:</p> <pre><code>case class Input1(key: String, rest: String) case class Input2(key: String, rest: String) </code></pre> <p>It's now a bit harder to initialize f1 and f2. Instead of using <code>split</code>, which will break all the line unnecessarily (and at great cost to performance), we'll divide the line right the at the first comma: everything before is key, everything after is rest. The method <code>span</code> does that:</p> <pre><code>def breakLine(line: String): (String, String) = line span (',' !=) </code></pre> <p>Play a bit with the <code>span</code> method on REPL to get a better understanding of it. As for <code>(',' !=)</code>, that's just an abbreviated form of saying <code>(x =&gt; ',' != x)</code>.</p> <p>Next, we need a way to create <code>Input1</code> and <code>Input2</code> from a tuple (the result of <code>breakLine</code>):</p> <pre><code>def TupleIsInput1(tuple: (String, String)) = Input1(tuple._1, tuple._2) def TupleIsInput2(tuple: (String, String)) = Input2(tuple._1, tuple._2) </code></pre> <p>We can now read the files:</p> <pre><code>val f1: List[Input1] = (Source fromFile "file1.csv" getLines () map breakLine map TupleIsInput1).toList val f2: List[Input2] = (Source fromFile "file2.csv" getLines () map breakLine map TupleIsInput2).toList </code></pre> <p>Another thing we can simplify is intersection. When we create a <code>Map</code>, its keys <em>are</em> sets, so we can create the maps first, and then use their keys to compute the intersection:</p> <pre><code>case class Key(key: String) def Input1IsKey(input: Input1): Key = Key(input.key) def Input2IsKey(input: Input2): Key = Key(input.key) // We now only keep the "rest" as the map value val m1 = (f1 map (input =&gt; Input1IsKey(input) -&gt; input.rest)).toMap val m2 = (f2 map (input =&gt; Input2IsKey(input) -&gt; input.rest)).toMap val intersection = m1.keySet intersect m2.keySet </code></pre> <p>And the output is computed like this:</p> <pre><code>val output = intersection map (key =&gt; key + m1(key) + m2(key)) </code></pre> <p>Note that I don't append comma anymore -- the rest of both f1 and f2 start with a comma already.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload