Note that there are some explanatory texts on larger screens.

plurals
  1. POPython/Django Extract and append only new links
    primarykey
    data
    text
    <p>I am putting together a project using Python 2.7 Django 1.5 on Windows 7. I have the following view:</p> <p>views.py:</p> <pre><code>def foo(): site = "http://www.foo.com/portal/jobs" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) soup = BeautifulSoup(jobpass) for tag in soup.find_all('a', href = True): tag['href'] = urlparse.urljoin('http://www.businessghana.com/portal/', tag['href']) return map(str, soup.find_all('a', href = re.compile('.getJobInfo'))) def example(): site = "http://example.com" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) soup = BeautifulSoup(jobpass) return map(str, soup.find_all('a', href = re.compile('.display-job'))) foo_links = foo() example_links = example() def all_links(): return (foo_links + example_links) def display_links(request): name = all_links() paginator = Paginator(name, 25) page = request.GET.get('page') try: name = paginator.page(page) except PageNotAnInteger: name = paginator.page(1) except EmptyPage: name = paginator.page(paginator.num_pages) return render_to_response('jobs.html', {'name' : name}) </code></pre> <p>my template looks like this:</p> <pre><code>&lt;ol&gt; {% for link in name %} &lt;li&gt; {{ link|safe }}&lt;/li&gt; {% endfor %} &lt;/ol&gt; &lt;div class="pagination"&gt; &lt;span class= "step-links"&gt; {% if name.has_previous %} &lt;a href="?page={{ names.previous_page_number }}"&gt;Previous&lt;/a&gt; {% endif %} &lt;span class = "current"&gt; Page {{ name.number }} of {{ name.paginator.num_pages}}. &lt;/span&gt; {% if name.has_next %} &lt;a href="?page={{ name.next_page_number}}"&gt;next&lt;/a&gt; {% endif %} &lt;/span&gt; &lt;/div&gt; </code></pre> <p>Right now as my code stands, anytime I run it, it scraps all the links on the frontpage of the sites selected and presents them paginated <strong>all afresh</strong>. However, I don't think its a good idea for the script to read/write all the links that had previously extracted links all over again and therefore would like to check for and append only new links. I would like to save the previously scraped links so that over the course of say, a week, all the links that have appeared on the frontpage of these sites will be available on my site as older pages. </p> <p>It's my first programming project and don't know how to incorporate this logic into my code.</p> <p>UPDATE:</p> <p>My model looks like this:</p> <pre><code>from django.db import models class jobLinks(models.Model): links = models.URLField() pub_date = models.DateTimeField('date retrieved') def __unicode__(self): return self.links </code></pre> <p>Any help/pointers/references will be greatly appreciated.</p> <p>regards, Max</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload