Note that there are some explanatory texts on larger screens.

plurals
  1. POPython: Printing data from a specific href (with ID tag)
    primarykey
    data
    text
    <p>I'm new to Python and trying to build one of my first webscrapers. I want to go to a page, open a bunch of subpages, find a specific link on the page (with an ID), and then I want to print the link-data. Right now I get the error: 'list indices must be integers, not str', which means I'm doing someting wrong in (atleast) the last line of code.</p> <p>What I'm really unsure about, is what I need to do to grab and parse the href data from a specific link - because I think, the rest is working (loading subpages). The scraper is (supposed) to grab all the urls of the Danish communes and print them, so the first line of print should be: </p> <p><a href="http://www.albertslund.dk" rel="nofollow">http://www.albertslund.dk</a> (follow by 97 more)</p> <p>Anyway, here's the code - hope anyone can tell me, what I'm doing wrong. Thanks a bunch in advance.</p> <pre><code>from BeautifulSoup import BeautifulSoup from mechanize import Browser f = open("kommuneadresser.txt", "w") br = Browser() url = "https://bdkv2.borger.dk/foa/Sider/default.aspx?fk=22&amp;foaid=11541520" page = br.open(url) html = page.read() soup = BeautifulSoup(html) link = soup.findAll('a') kommunelink = link[21:116] #we create a loop - for every single kommunelink in the list, #'something' will happen for kommune in kommunelink: #the link-part is saved as a string kommuneurl = kommune['href'] #we construct a new url from two strings fuldurl = "https://bdkv2.borger.dk/" + kommuneurl #we open the page and save it in a variable kommuneside = br.open(fuldurl) #we read the page html2 = kommuneside.read() #we handle the page in beautifulsoup soup2 = BeautifulSoup(html2) #we find a specific link on the page hjemmesidelink = soup2.findAll('a', attras={'ID':"uscAncHomesite"}) print hjemmesidelink['href'] </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload