StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
14570657
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-01-28T20:30:19.567
FavoriteCount
0
LastActivityDate
2013-01-28T20:55:04.247
LastEditDate
2013-01-28T20:55:04.247
LastEditorUserId
957276
OwnerUserId
957276
ParentId
14568528
PostTypeId
2
Score
2
ViewCount
0
LastEditorDisplayName
text
Body
<p>I got a little bored. Try this on for size. I didn't actually have a chance to check if it was parsing and writting correctly but other than that I believe it should run given some info. This problem is a good opportunity to use queueing. Let me know how fast it runs!</p> <pre><code>from threading import Thread import Queue import os import time import sys # declare some global items # queue that an author thread can write line items to a csv write_q = Queue.Queue() # queue filled with files to parse read_q = Queue.Queue() # queue filled with files that have size change during read. Can # preload this queue to optimize however program should handle any # file that changes during operation moving_q = Queue.Queue() # given csv labels labels = ['date', 'message', 'action', 'details'] # global for writer thread so it knows when to close files_to_parse = True # parsing function for any number of threads def file_parser(): # Each parser thread will run until the read_q is empty while True: moving = False # Test for a file from the read queue or moving queue try: if not moving_q.empty(): try: f_path = moving_q.get(False) moving = True # if the moving queue is empty after trying to read # might have been snatched by different thread. Ignore error except Queue.Empty: pass else: # No items left in moving queue so grab non moving file f_path = read_q.get(False) # all files have been dealt with except Queue.Empty: print "Done Parsing" sys.exit() # Following will parse a file and test that the file is not being # modified during the read with open(f_path, 'r') as f: # csv reader setup reader = csv.DictReader(f, labels, delimiter=' ', restkey='rest') # initillized file size (when we started reading) pre = os.path.getsize(f_path) # store output items in a list so if file is updated during read # we can just ignore those items and read file later line_items = [] # parse the file line by line for line in reader: # Check that file hasn't been updated post = os.path.getsize(f_path) if pre != post: # if file has changed put the file back on the queue and clear the output lines moving_q.put(f_path) line_items = None break # parse the line and add it to output list else: if line.get('rest'): line['details'] += ' %s' % (' '.join(line['rest'])) line_items.append(','.join([infile,line['date'], line['message'], line['action'], line['details']]) + '\n') # don't want to do reading and writing in same thread. Push # all line items onto the write thread for the author to deal with if line_items and moving: write_q.put(line_items) moving_q.task_done() elif line_items and not moving: write_q.put(line_items) read_q.task_done() # author thread that will write items to a file as other threads complete # tasks. Should help speed up IO bound processing def file_author(out_file): with open(out_file,'w') as f: # parse files until all the parser threads are running while files_to_parse or not read_q.empty(): # only one writer thread so write as items are put into thread if not read_q.empty(): line_items = write_q.get(False) for line_item in line_items: f.write(line_item) write_q.task_done() # sleep in the downtime so we dont overload PC else: time.sleep(.1) print "Done writting" if __name__ == "__main__": # list of file names as you had before listing = [] outfile = "MyNewCSVfile.csv" # You can optimize parsing by adding known "moving files" directly # to the moving_queue, however program should handle either way for infile in listing: _path2 = _path + infile write_q.put(_path2) # make a writer thread t = Thread(target = file_author, args = (outfile,)) t.daemon = True t.start() # make some parse threads for i in range(10): t = Thread(target = file_parser) t.daemon = True t.start() # wait for parser threads to finish work read_q.join() moving_q.join() # close author files_to_parse = False time.sleep(.1) print "Complete" </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POPython File Parsing
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USPaul Seeb
UserOwnerUserId
1. USPaul Seeb
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POPython File Parsing
  singulars
  PostTypePostTypeId
  PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTAcceptedByOriginator
CommentsPostId
1. COHa, wasn't expecting such a large response. Thank you though. It seems to work fine, all the output looks correct. I over estimated the time mine took to run (about 20 seconds), your code runs in about 14-15.
  singulars
  PostPostId
  PO
  UserUserId
  USB-M
2. COHope the comments helped you understand it. I got a little into it, it is a good problem.
  singulars
  PostPostId
  PO
  UserUserId
  USPaul Seeb

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.