StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POScraping with BeautifulSoup and multiple paragraphs
primarykey
Id
8331579
data
AcceptedAnswerId
8332008
AnswerCount
3
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2011-11-30T19:16:37.870
FavoriteCount
2
LastActivityDate
2016-09-29T11:47:18.873
LastEditDate
2011-11-30T21:18:02.007
LastEditorUserId
0
OwnerUserId
1074057
ParentId
0
PostTypeId
1
Score
9
ViewCount
5576
LastEditorDisplayName
user212218
text
Body
I'm trying to scrape a speech from a website using BeautifulSoup. I'm encountering problems, however, since the speech is divided into many different paragraphs. I'm extremely new to programming and am having trouble figuring out how to deal with this. The HTML of the page looks like this: <pre><code>Thank you very much. Mr. Speaker, Vice President Cheney, Members of Congress, distinguished guests, fellow citizens: As we gather tonight, our Nation is at war; our economy is in recession; and the civilized world faces unprecedented dangers. Yet, the state of our Union has never been stronger. We last met in an hour of shock and suffering. In 4 short months, our Nation has comforted the victims, begun to rebuild New York and the Pentagon, rallied a great coalition, captured, arrested, and rid the world of thousands of terrorists, destroyed Afghanistan's terrorist training camps, saved a people from starvation, and freed a country from brutal oppression. The American flag flies again over our Embassy in Kabul. Terrorists who once occupied Afghanistan now occupy cells at Guantanamo Bay. And terrorist leaders who urged followers to sacrifice their lives are running for their own. </code></pre> It continues on like that for awhile, with multiple paragraph tags. I'm trying to extract all of the text within the span. I've tried a couple of different ways to get the text, but both have failed to get the text that I want. The first I tried is: <pre><code>import urllib2,sys from BeautifulSoup import BeautifulSoup, NavigableString address = 'http://www.presidency.ucsb.edu/ws/index.php?pid=29644&st=&st1=#axzz1fD98kGZW' html = urllib2.urlopen(address).read() soup = BeautifulSoup(html) thespan = soup.find('span', attrs={'class': 'displaytext'}) print thespan.string </code></pre> which gives me: <blockquote> Mr. Speaker, Vice President Cheney, Members of Congress, distinguished guests, fellow citizens: As we gather tonight, our Nation is at war; our economy is in recession; and the civilized world faces unprecedented dangers. Yet, the state of our Union has never been stronger. </blockquote> That is the portion of the text up until the first paragraph tag. I then tried: <pre><code>import urllib2,sys from BeautifulSoup import BeautifulSoup, NavigableString address = 'http://www.presidency.ucsb.edu/ws/index.php?pid=29644&st=&st1=#axzz1fD98kGZW' html = urllib2.urlopen(address).read() soup = BeautifulSoup(html) thespan = soup.find('span', attrs={'class': 'displaytext'}) for section in thespan: paragraph = section.findNext('p') if paragraph and paragraph.string: print '>', paragraph.string else: print '>', section.parent.next.next.strip() </code></pre> This gave me the text between the first paragraph tag and the second paragraph tag. So, I'm looking for a way to get the entire text, instead of just sections.
Tags
<python><beautifulsoup><web-scraping>
Title
Scraping with BeautifulSoup and multiple paragraphs
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. This table or related slice is empty.
UserOwnerUserId
1. USuser1074057
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POScraping with BeautifulSoup and multiple paragraphs
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POScraping with BeautifulSoup and multiple paragraphs
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POScraping with BeautifulSoup and multiple paragraphs
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.