Note that there are some explanatory texts on larger screens.

plurals
  1. POEditing a pandas script to ignore but not remove data then match & updating + comparing to prevent wasteful saves + slicing data to match with?
    primarykey
    data
    text
    <p>I've got some issue with one of my scripts... I'll put the problems in bullets.</p> <ul> <li><strong>Issue/Question 1 - Comparing the original testing.csv to the modified one before saving, if different it should save, if the same it should not save.</strong> <ul> <li>In my code below, the data comes out the same but for some reason it thinks it's different and I can't find out why...</li> </ul></li> <li><strong>Issue/Question 2 - Ignoring certain data during a match</strong> <ul> <li>I'm wanting to match using <code>MATCH2</code> but ignore the parenthesis data for example in the last classes data <code>MATCH2</code> has <code>Mdata (D)</code> it needs to match by <code>MData</code></li> </ul></li> <li><strong>Issue/Question 3 - Slicing data to match with</strong> <ul> <li>I'm wanting to find a way so if I wanted to use <code>MATCH1</code> I could set <code>MATCH1</code> so that it only uses <code>MATCH1[-1:]</code> which would ultimately give me numbers in this example.</li> </ul></li> </ul> <h3><code>Testing.py</code></h3> <pre><code>import re import pandas from pandas.util.testing import assert_frame_equal # each block in the text file will be one element of this list matchers = [[]] i = 0 with open('testing.txt') as infile: for line in infile: line = line.strip() # Blocks are seperated by blank lines if len(line) == 0: i += 1 matchers.append([]) # assume there are always two blank lines between items # and just skip to the lext line infile.next() continue matchers[i].append(line) # This regular expression matches the variable number of students in each block studentlike = re.compile('(\d+) (.+) (\d+/\d+)') # These are the names of the fields we expect at the end of each block datanames = ['Data', 'misc2', 'bla3'] # We will build a table containing a list of elements for each student table = [] for matcher in matchers: # We use an iterator over the block lines to make indexing simpler it = iter(matcher) # The first two elements are match values m1, m2 = it.next(), it.next() # then there are a number of students students = [] for possiblestudent in it: m = studentlike.match(possiblestudent) if m: students.append(list(m.groups())) else: break # After the students come the data elements, which we read into a dictionary # We also add in the last possible student line as that didn't match the student re dataitems = dict(item.split() for item in [possiblestudent] + list(it)) # Finally we construct the table for student in students: # We use the dictionary .get() method to return blanks for the missing fields table.append([m1, m2] + student + [dataitems.get(d, '') for d in datanames]) textcols = ['MATCH2', 'MATCH1', 'TITLE01', 'MATCH3','TITLE02', 'Data', 'misc2', 'bla3'] csvdata = pandas.read_csv('testing.csv') csvdata_old = csvdata.copy() textdata = pandas.DataFrame(table, columns=textcols) # Add any new columns newCols = textdata.columns - csvdata.columns for c in newCols: csvdata[c] = None mergecols = ['MATCH2', 'MATCH1', 'MATCH3'] csvdata.set_index(mergecols, inplace=True, drop=False) csvdata_old.set_index(mergecols, inplace=True, drop=False) textdata.set_index(mergecols, inplace=True,drop=False) csvdata.update(textdata) try: assert_frame_equal(csvdata, csvdata_old) print "True (Same)" except: csvdata.to_csv('testing.csv', index=False) print "False (Different)" </code></pre> <h3><code>testing.txt</code></h3> <pre><code>MData DMATCH1 3 Tommy 144512/23332 1 Jim 90000/222311 1 Elz M 90000/222311 1 Ben 90000/222311 Data $50.90 misc2 $10.40 bla3 $20.20 MData DMATCH2 4 James Smith 2333/114441 4 Mike 90000/222311 4 Jessica Long 2333/114441 Data $50.90 bla3 $5.44 Mdata DMATCH3 5 Joe Reane 0/0 5 Peter Jones 90000/222311 Data $10.91 misc2 $420.00 bla3 $210.00 </code></pre> <h3><code>testing.csv</code></h3> <pre><code>MATCH1,MATCH2,TITLE,TITLE,TITLE,TITLE,TITLE,TITLE,MATCH3,DATA,TITLE,TITLE DMATCH1,MData (N/A),data,data,data,data,data,data,Tommy,55,data,data DMATCH1,MData (N/A),data,data,data,data,data,data,Ben,54,data,data DMATCH1,MData (N/A),data,data,data,data,data,data,Jim,52,data,data DMATCH1,MData (N/A),data,data,data,data,data,data,Elz M,22,data,data DMATCH2,MData (B/B),data,data,data,data,data,data,James Smith,15,data,data DMATCH2,MData (B/B),data,data,data,data,data,data,Jessica Long,224,data,data DMATCH2,MData (B/B),data,data,data,data,data,data,Mike,62,data,data DMATCH3,Mdata (D),data,data,data,data,data,data,Joe Reane,66,data,data DMATCH3,Mdata (D),data,data,data,data,data,data,Peter Jones,256,data,data DMATCH3,Mdata (D),data,data,data,data,data,data,Lesley Lope,5226,data,data </code></pre> <h3>Desired <code>testing.csv</code> after script has been ran...</h3> <pre><code>MATCH1,MATCH2,TITLE,TITLE.1,TITLE.2,TITLE.3,TITLE.4,TITLE.5,MATCH3,DATA,TITLE.6,TITLE.7,Data,TITLE01,TITLE02,bla3,misc2 DMATCH1,MData (N/A),data,data,data,data,data,data,Tommy,55,data,data,$50.90,3,144512/23332,$20.20,$10.40 DMATCH1,MData (N/A),data,data,data,data,data,data,Ben,54,data,data,$50.90,1,90000/222311,$20.20,$10.40 DMATCH1,MData (N/A),data,data,data,data,data,data,Jim,52,data,data,$50.90,1,90000/222311,$20.20,$10.40 DMATCH1,MData (N/A),data,data,data,data,data,data,Elz M,22,data,data,$50.90,1,90000/222311,$20.20,$10.40 DMATCH2,MData (B/B),data,data,data,data,data,data,James Smith,15,data,data,$50.90,4,2333/114441,$5.44, DMATCH2,MData (B/B),data,data,data,data,data,data,Jessica Long,224,data,data,$50.90,4,2333/114441,$5.44, DMATCH2,MData (B/B),data,data,data,data,data,data,Mike,62,data,data,$50.90,4,90000/222311,$5.44, DMATCH3,Mdata (D),data,data,data,data,data,data,Joe Reane,66,data,data,$10.91,5,0/0,$210.00,$420.00 DMATCH3,Mdata (D),data,data,data,data,data,data,Peter Jones,256,data,data,$10.91,5,90000/222311,$210.00,$420.00 DMATCH3,Mdata (D),data,data,data,data,data,data,Lesley Lope,5226,data,data,,,,, </code></pre> <p>I'd greatly appreciate the help if anyone can :)</p> <h1>Edit for bheklilr</h1> <h3><code>testing.txt</code></h3> <pre><code>Mdata DMATCH3 5 Joe Reane 0/0 5 Peter Jones 90000/222311 Data $10.91 misc2 $420.00 bla3 $210.00 </code></pre> <h3><code>testing.csv</code></h3> <pre><code>MATCH1,MATCH2,TITLE,MATCH3,DATA,TITLE DMATCH3,Mdata (D),data,Joe Reane,66,data DMATCH3,Mdata (D),data,Peter Jones,256,data DMATCH3,Mdata (D),data,Lesley Lope,5226,data </code></pre> <h3>Desired <code>testing.csv</code> after script has been ran...</h3> <pre><code>MATCH1,MATCH2,TITLE,MATCH3,DATA,TITLE.1,Data,TITLE01,TITLE02,bla3,misc2 DMATCH3,Mdata (D),data,Joe Reane,66,data,$10.91,5,0/0,$210.00,$420.00 DMATCH3,Mdata (D),data,Peter Jones,256,data,$10.91,5,90000/222311,$210.00,$420.00 DMATCH3,Mdata (D),data,Lesley Lope,5226,data,,,,, </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload