Note that there are some explanatory texts on larger screens.

plurals
  1. POBeautifulSoup: Extract links to file and append new links only on subsequent rerun
    primarykey
    data
    text
    <p>I have the code below to extract links from specific sites. </p> <pre><code>from bs4 import BeautifulSoup import urllib2, sys import re def jobsinghana(): site = "http://www.jobsinghana.com/jobs" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) mayday = urllib2.urlopen(req) soup = BeautifulSoup(mayday) jobs = soup.find_all('a', {'class' : 'hover'}) print str(jobs).strip('[]') def modernghana(): site = "http://www.modernghana.com/GhanaHome/classifieds/list_classifieds.asp? menu_id=7&amp;sub_menu_id=362&amp;gender=&amp;cat_id=173&amp;has_price=2" hdr = {'User-Agent' : 'Mozilla/5.0'} req = urllib2.Request(site, headers=hdr) jobpass = urllib2.urlopen(req) soup = BeautifulSoup(jobpass) jobs = soup.find_all('a', href = re.compile('show_classifieds')) for a in jobs: header = a.parent.find_previous_sibling('h3').text a.string = header print a jobsinghana = jobsinghana() modernghana = modernghana() alllinks = open('content.html', 'w') alllinks.write("\n".join((jobsinghana, modernghana))) allinks.close() </code></pre> <ol> <li><p>The last 3 lines is suppose to write the extracted links to file but I'm getting the error below:</p> <pre><code>TypeError: sequence item 0: expected string, NoneType found </code></pre></li> <li><p>I also notice that the code extracts all the links all over again anytime I run the program but because most of those links would have been extracted at an earlier instance of the code run, I'm interested in extracting and appending only the new links to the file upon subsequent run.</p></li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload