StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython/Django Extract and append only new links
primarykey
Id
20849254
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-12-30T23:22:20.233
FavoriteCount
0
LastActivityDate
2014-01-06T20:40:55.863
LastEditDate
2014-01-02T11:24:27.173
LastEditorUserId
2816158
OwnerUserId
2816158
ParentId
0
PostTypeId
1
Score
0
ViewCount
151
LastEditorDisplayName
text
Body
I am putting together a project using Python 2.7 Django 1.5 on Windows 7. I have the following view: views.py: <pre><code>def foo(): site = "http://www.foo.com/portal/jobs" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) soup = BeautifulSoup(jobpass) for tag in soup.find_all('a', href = True): tag['href'] = urlparse.urljoin('http://www.businessghana.com/portal/', tag['href']) return map(str, soup.find_all('a', href = re.compile('.getJobInfo'))) def example(): site = "http://example.com" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) soup = BeautifulSoup(jobpass) return map(str, soup.find_all('a', href = re.compile('.display-job'))) foo_links = foo() example_links = example() def all_links(): return (foo_links + example_links) def display_links(request): name = all_links() paginator = Paginator(name, 25) page = request.GET.get('page') try: name = paginator.page(page) except PageNotAnInteger: name = paginator.page(1) except EmptyPage: name = paginator.page(paginator.num_pages) return render_to_response('jobs.html', {'name' : name}) </code></pre> my template looks like this: <pre><code><ol> {% for link in name %} <li> {{ link|safe }}</li> {% endfor %} </ol> <div class="pagination"> {% if name.has_previous %} <a href="?page={{ names.previous_page_number }}">Previous</a> {% endif %} Page {{ name.number }} of {{ name.paginator.num_pages}}. {% if name.has_next %} <a href="?page={{ name.next_page_number}}">next</a> {% endif %} </div> </code></pre> Right now as my code stands, anytime I run it, it scraps all the links on the frontpage of the sites selected and presents them paginated all afresh. However, I don't think its a good idea for the script to read/write all the links that had previously extracted links all over again and therefore would like to check for and append only new links. I would like to save the previously scraped links so that over the course of say, a week, all the links that have appeared on the frontpage of these sites will be available on my site as older pages. It's my first programming project and don't know how to incorporate this logic into my code. UPDATE: My model looks like this: <pre><code>from django.db import models class jobLinks(models.Model): links = models.URLField() pub_date = models.DateTimeField('date retrieved') def __unicode__(self): return self.links </code></pre> Any help/pointers/references will be greatly appreciated. regards, Max
Tags
<python><django>
Title
Python/Django Extract and append only new links
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USfromPythonImportNoob
UserOwnerUserId
1. USfromPythonImportNoob
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. This table or related slice is empty.
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.