StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
19168656
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
1
CommunityOwnedDate
CreationDate
2013-10-03T20:26:32.273
FavoriteCount
0
LastActivityDate
2013-10-04T14:32:22.380
LastEditDate
2013-10-04T14:32:22.380
LastEditorUserId
1146334
OwnerUserId
1146334
ParentId
19168220
PostTypeId
2
Score
6
ViewCount
0
LastEditorDisplayName
text
Body
<p>Well, your code kind of already tells you what's going on. In your lambda you are only grabbing absolute links that start with http:// (which you are not grabbing https FWIW). You should grab all of the links and check to see if they start with http+ or not. If they don't, then they are a relative link, and since you know what the <code>current_page</code> is then you can use that to create an absolute link.</p> <p>Here's a modification to your code. Excuse my Python as it's a little rusty, but I ran it and it worked in Python 2.7 for me. You'll want to clean it up and add some edge/error detection, but you get the gist:</p> <pre><code>#!/usr/bin/python from bs4 import BeautifulSoup import urllib2 import itertools import random import urlparse class Crawler(object): """docstring for Crawler""" def __init__(self): self.soup = None # Beautiful Soup object self.current_page = "http://www.python.org/" # Current page's address self.links = set() # Queue with every links fetched self.visited_links = set() self.counter = 0 # Simple counter for debug purpose def open(self): # Open url print self.counter , ":", self.current_page res = urllib2.urlopen(self.current_page) html_code = res.read() self.visited_links.add(self.current_page) # Fetch every links self.soup = BeautifulSoup(html_code) page_links = [] try : for link in [h.get('href') for h in self.soup.find_all('a')]: print "Found link: '" + link + "'" if link.startswith('http'): page_links.append(link) print "Adding link" + link + "\n" elif link.startswith('/'): parts = urlparse.urlparse(self.current_page) page_links.append(parts.scheme + '://' + parts.netloc + link) print "Adding link " + parts.scheme + '://' + parts.netloc + link + "\n" else: page_links.append(self.current_page+link) print "Adding link " + self.current_page+link + "\n" except Exception, ex: # Magnificent exception handling print ex # Update links self.links = self.links.union( set(page_links) ) # Choose a random url from non-visited set self.current_page = random.sample( self.links.difference(self.visited_links),1)[0] self.counter+=1 def run(self): # Crawl 3 webpages (or stop if all url has been fetched) while len(self.visited_links) < 3 or (self.visited_links == self.links): self.open() for link in self.links: print link if __name__ == '__main__': C = Crawler() C.run() </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POScrape internal links with Beautiful soup
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USFuriousGeorge
UserOwnerUserId
1. USFuriousGeorge
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POScrape internal links with Beautiful soup
  singulars
  PostTypePostTypeId
  PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTAcceptedByOriginator
2. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
CommentsPostId
1. COI already know what you are suggesting to me. I tried but unable to handle all internal links scuccessfully so can u make the changes in code and say me what i have to do
  singulars
  PostPostId
  PO
  UserUserId
  USRohit.nib

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.