StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython OOP Project Organization
primarykey
Id
9217591
data
AcceptedAnswerId
9217667
AnswerCount
1
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2012-02-09T19:42:48.910
FavoriteCount
1
LastActivityDate
2012-02-09T19:48:12.907
LastEditDate
LastEditorUserId
0
OwnerUserId
1193173
ParentId
0
PostTypeId
1
Score
0
ViewCount
369
LastEditorDisplayName
text
Body
I'm a bit new to Python dev -- I'm creating a larger project for some web scraping. I want to approach this as "Pythonically" as possible, and would appreciate some help with the project structure. Here's how I'm doing it now: Basically, I have a base class for an object whose purpose is to go to a website and parse some specific data on it into its own array, jobs[] minion.py <pre><code>class minion: # Empty getJobs() function to be defined by object pre-instantiation def getJobs(self): pass # Constructor for a minion that requires site authorization # Ex: minCity1 = minion('http://portal.com/somewhere', 'user', 'password') # or minCity2 = minion('http://portal.com/somewhere') def __init__(self, title, URL, user='', password=''): self.title = title self.URL = URL self.user = user self.password = password self.jobs = [] if (user == '' and password == ''): self.reqAuth = 0 else: self.reqAuth = 1 def displayjobs(self): for j in self.jobs: j.display() </code></pre> I'm going to have about 100 different data sources. The way I'm doing it now is to just create a separate module for each "Minion", which defines (and binds) a more tailored getJobs() function for that object Example: minCity1.py <pre><code>from minion import minion from BeautifulSoup import BeautifulSoup import urllib2 from job import job # MINION CONFIG minTitle = 'Some city' minURL = 'http://www.somewebpage.gov/' # Here we define a function that will be bound to this object's getJobs function def getJobs(self): page = urllib2.urlopen(self.URL) soup = BeautifulSoup(page) # For each row for tr in soup.findAll('tr'): tJob = job() span = tr.findAll(['span', 'class="content"']) # If row has 5 spans, pull data from span 2 and 3 ( [1] and [2] ) if len(span) == 5: tJob.title = span[1].a.renderContents() tJob.client = 'Some City' tJob.source = minURL tJob.due = span[2].div.renderContents().replace(' ', '') self.jobs.append(tJob) # Don't forget to bind the function to the object! minion.getJobs = getJobs # Instantiate the object mCity1 = minion(minTitle, minURL) </code></pre> I also have a separate module which simply contains a list of all the instantiated minion objects (which I have to update each time I add one): minions.py <pre><code>from minion_City1 import mCity1 from minion_City2 import mCity2 from minion_City3 import mCity3 from minion_City4 import mCity4 minionList = [mCity1, mCity2, mCity3, mCity4] </code></pre> main.py references minionList for all of its activities for manipulating the aggregated data. This seems a bit chaotic to me, and was hoping someone might be able to outline a more Pythonic approach. Thank you, and sorry for the long post!
Tags
<python><oop><structure>
Title
Python OOP Project Organization
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USdru
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. COYou may want to consider [scrapy](http://scrapy.org/), or at least study its [architecture](http://doc.scrapy.org/en/latest/topics/architecture.html).
 singulars
 PostPostId
 POPython OOP Project Organization
 UserUserId
 USFrancis Avila
2. CO"This seems a bit chaotic to me"? Why? How? Please explain what **specifically** seems chaotic. You seem to have a lot of customization going on. How else do you think you might approach it?
 singulars
 PostPostId
 POPython OOP Project Organization
 UserUserId
 USS.Lott

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.