Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I see a couple things amiss with this code. First is this line:</p> <pre><code>record = open(sys.argv[1]) </code></pre> <p>I find it odd that this <code>record</code> variable is never used anywhere else in the code. Even though the <code>mapper</code> function is defined as follows:</p> <pre><code>def mapper(record): ... </code></pre> <p>...<em>that</em> <code>record</code> is local to the <code>mapper</code> function. It's in a different scope than the first <code>record</code>. Whatever data is passed to <code>mapper</code> is assigned to its local <code>record</code> and used accordingly, and the file object assigned to the first <code>record</code> is never touched. I don't feel that this is tied to the error, though. Because that first <code>record</code> is not used anywhere else, you can pretty safely delete that line.</p> <p>Then there's the <code>reducer</code> function:</p> <pre><code>def reducer(key, value): # reducer should take 2 inputs according to the task if key in new_dict: # checking if key already added to dict new_dict[key].extend(list(value)) # if yes just append all records to the value new_dict[key] = list(value) # if not create new key and assign record to value for key in new_dict: if len(new_dict[key]) == 27: # checks to emit only records found in both tables mr.emit(new_dict[key]) </code></pre> <p>Your own comments offer the clue to the problem here. First you say you're checking to see if the key is already in the dict. If so, just append all records to the value. If not, create a new key and assign the record to the value.</p> <p>The problem is with the line associated with the "if not" comment. If it's truly what should be done if the first <code>if</code> test fails, then it should be prefaced by an <code>else</code> line:</p> <pre><code> ... if key in new_dict: # checking if key already added to dict new_dict[key].extend(list(value)) # if yes just append all records to the value else: new_dict[key] = list(value) # if not create new key and assign record to value ... </code></pre> <p>The way you wrote it, even if that <code>if</code> test succeeds and it appends the data to the existing value for the key, it's going to immediately stomp over that change. In other words, the value for that key isn't going to grow. It's always going to represent the most recently submitted value for the key.</p> <p>Here's the full code edited with all the suggested changes:</p> <pre><code>import MapReduce import sys """ Word Count Example in the Simple Python MapReduce Framework """ mr = MapReduce.MapReduce() # ============================= # Do not modify above this line def mapper(record): key = record[1] # assign order_id from each record as key value = list(record) # assign whole record as value for each key mr.emit_intermediate(key, value) # emit key-value pairs new_dict = {} # create dict to keep track of records def reducer(key, value): if not key in new_dict: new_dict[key] = value else: new_dict[key].extend(value) for key in new_dict: if len(new_dict[key]) == 27: mr.emit(new_dict[key]) # Do not modify below this line # ============================= if __name__ == '__main__': inputdata = open(sys.argv[1]) mr.execute(inputdata, mapper, reducer) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload