Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I got a little bored. Try this on for size. I didn't actually have a chance to check if it was parsing and writting correctly but other than that I believe it should run given some info. This problem is a good opportunity to use queueing. Let me know how fast it runs!</p> <pre><code>from threading import Thread import Queue import os import time import sys # declare some global items # queue that an author thread can write line items to a csv write_q = Queue.Queue() # queue filled with files to parse read_q = Queue.Queue() # queue filled with files that have size change during read. Can # preload this queue to optimize however program should handle any # file that changes during operation moving_q = Queue.Queue() # given csv labels labels = ['date', 'message', 'action', 'details'] # global for writer thread so it knows when to close files_to_parse = True # parsing function for any number of threads def file_parser(): # Each parser thread will run until the read_q is empty while True: moving = False # Test for a file from the read queue or moving queue try: if not moving_q.empty(): try: f_path = moving_q.get(False) moving = True # if the moving queue is empty after trying to read # might have been snatched by different thread. Ignore error except Queue.Empty: pass else: # No items left in moving queue so grab non moving file f_path = read_q.get(False) # all files have been dealt with except Queue.Empty: print "Done Parsing" sys.exit() # Following will parse a file and test that the file is not being # modified during the read with open(f_path, 'r') as f: # csv reader setup reader = csv.DictReader(f, labels, delimiter=' ', restkey='rest') # initillized file size (when we started reading) pre = os.path.getsize(f_path) # store output items in a list so if file is updated during read # we can just ignore those items and read file later line_items = [] # parse the file line by line for line in reader: # Check that file hasn't been updated post = os.path.getsize(f_path) if pre != post: # if file has changed put the file back on the queue and clear the output lines moving_q.put(f_path) line_items = None break # parse the line and add it to output list else: if line.get('rest'): line['details'] += ' %s' % (' '.join(line['rest'])) line_items.append(','.join([infile,line['date'], line['message'], line['action'], line['details']]) + '\n') # don't want to do reading and writing in same thread. Push # all line items onto the write thread for the author to deal with if line_items and moving: write_q.put(line_items) moving_q.task_done() elif line_items and not moving: write_q.put(line_items) read_q.task_done() # author thread that will write items to a file as other threads complete # tasks. Should help speed up IO bound processing def file_author(out_file): with open(out_file,'w') as f: # parse files until all the parser threads are running while files_to_parse or not read_q.empty(): # only one writer thread so write as items are put into thread if not read_q.empty(): line_items = write_q.get(False) for line_item in line_items: f.write(line_item) write_q.task_done() # sleep in the downtime so we dont overload PC else: time.sleep(.1) print "Done writting" if __name__ == "__main__": # list of file names as you had before listing = [] outfile = "MyNewCSVfile.csv" # You can optimize parsing by adding known "moving files" directly # to the moving_queue, however program should handle either way for infile in listing: _path2 = _path + infile write_q.put(_path2) # make a writer thread t = Thread(target = file_author, args = (outfile,)) t.daemon = True t.start() # make some parse threads for i in range(10): t = Thread(target = file_parser) t.daemon = True t.start() # wait for parser threads to finish work read_q.join() moving_q.join() # close author files_to_parse = False time.sleep(.1) print "Complete" </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload