Note that there are some explanatory texts on larger screens.

plurals
  1. POPython - implementing join in MapReduce - problems with reducer output
    primarykey
    data
    text
    <p>This is call for help with HW task in Data Science course I am doing on Coursera, since I could not get any advice on Coursera forum. I've made my code, but unfortunately the output does not return expected result. Here's the problem at hand:</p> <p>Task: Implement a relational join as a MapReduce query</p> <p>Input (Mapper): </p> <p>The input will be database records formatted as lists of Strings. Every list element corresponds to a different field in it’s corresponding record. The first item(index 0) in each record is a string that identifies which table the record originates from. This field has two possible values:</p> <ol> <li>‘line_item’ indicates that the record is a line item. 2.‘order’ indicates that the record is an order.</li> </ol> <p>The second element(index 1) in each record is the order_id. LineItem records have 17 elements including the identifier string. Order records have 10 elements including the identifier string.</p> <p>Output (Reducer):</p> <p>The output should be a joined record.</p> <p>The result should be a single list of length 27 that contains the fields from the order record followed by the fields from the line item record. Each list element should be a string.</p> <p>My code is:</p> <pre><code>import MapReduce import sys """ Word Count Example in the Simple Python MapReduce Framework """ mr = MapReduce.MapReduce() # ============================= # Do not modify above this line record = open(sys.argv[1]) # this read input, given by instructor def mapper(record): key = record[1] # assign order_id from each record as key value = list(record) # assign whole record as value for each key mr.emit_intermediate(key, value) # emit key-value pairs def reducer(key, value): new_dict = {} # create dict to keep track of records if not key in new_dict: new_dict[key] = value else: new_dict[key].extend(value) for key in new_dict: if len(new_dict[key]) == 27: mr.emit(new_dict[key]) # Do not modify below this line # ============================= if __name__ == '__main__': inputdata = open(sys.argv[1]) mr.execute(inputdata, mapper, reducer) </code></pre> <p>The error message I am getting is "Expected: 31 records, got 0".</p> <p>Also, expected output records should like like this - just one list with all records lumped together, w/o any de-duplication. </p> <pre><code>["order", "5", "44485", "F", "144659.20", "1994-07-30", "5-LOW", "Clerk#000000925", "0", "quickly. bold deposits sleep slyly. packages use slyly", "line_item", "5", "37531", "35", "3", "50", "73426.50", "0.08", "0.03", "A", "F", "1994-08-08", "1994-10-13", "1994-08-26", "DELIVER IN PERSON", "AIR", "eodolites. fluffily unusual"] </code></pre> <p>Sorry for the long questions, and it amy be a bit of a mess, but I hope the answer will be obvious to someone. </p> <p>Similar code which worked for me:</p> <pre><code>def mapper(record): # key: document identifier # value: document contents friend = record[0] value = 1 mydict = {} mr.emit_intermediate(friend, value) mydict[friend] = int(value) def reducer(friend, value): # key: word # value: list of occurrence counts newdict = {} if not friend in newdict: newdict[friend] = value else: newdict[friend] = newdict[friend] + 1 for friend in newdict: mr.emit((friend, (newdict[friend]))) </code></pre> <p>Thanks! Sergey</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload