StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POBeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?
primarykey
Id
4362981
data
AcceptedAnswerId
4363338
AnswerCount
2
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2010-12-06T03:39:50.547
FavoriteCount
3
LastActivityDate
2010-12-06T08:27:20.427
LastEditDate
2010-12-06T05:31:13.770
LastEditorUserId
511200
OwnerUserId
511200
ParentId
0
PostTypeId
1
Score
16
ViewCount
28767
LastEditorDisplayName
text
Body
My source code looks like: <pre><code><h3>Header3 (Start here)</h3> <ul> <li>List items</li> <li>Etc...</li> </ul> <h3>Header 3</h3> <ul> <li>List items</li> <ul> <li>Nested list items</li> <li>Nested list items</li></ul> <li>List items</li> </ul> <h2>Header 2 (end here)</h2> </code></pre> I'd like all the "li" tags following the first "h3" tag and stopping at the next "h2" tag, including all nested li tags. <blockquote> firstH3 = soup.find('h3') </blockquote> correctly finds the place I'd like to start. <pre><code>firstH3 = soup.find('h3') # Start here uls = [] for nextSibling in firstH3.findNextSiblings(): if nextSibling.name == 'h2': break if nextSibling.name == 'ul': uls.append(nextSibling) </code></pre> gives me a list of ULs, each with LI contents that I need. EXCERPT OF THE "uls" LIST: <pre><code><ul> ... <li><a href="/wiki/Agent_Cody_Banks" title="Agent Cody Banks">Agent Cody Banks</a> (2003)</li> <li><a href="/wiki/Agent_Cody_Banks_2:_Destination_London" title="Agent Cody Banks 2: Destination London">Agent Cody Banks 2: Destination London</a> (2004)</li> <li>Air Bud series: <ul> <li><a href="/wiki/Air_Bud:_World_Pup" title="Air Bud: World Pup">Air Bud: World Pup</a> (2000)</li> <li><a href="/wiki/Air_Bud:_Seventh_Inning_Fetch" title="Air Bud: Seventh Inning Fetch">Air Bud: Seventh Inning Fetch</a> (2002)</li> <li><a href="/wiki/Air_Bud:_Spikes_Back" title="Air Bud: Spikes Back">Air Bud: Spikes Back</a> (2003)</li> <li><a href="/wiki/Air_Buddies" title="Air Buddies">Air Buddies</a> (2006)</li> </ul> </li> <li><a href="/wiki/Akeelah_and_the_Bee" title="Akeelah and the Bee">Akeelah and the Bee</a> (2006)</li> ... </ul> </code></pre> But I'm unsure of where to go from here. I'm a newbie programmer trying to jump in to Python by building a script that scrapes <a href="http://en.wikipedia.org/wiki/2000s_in_film" rel="noreferrer">http://en.wikipedia.org/wiki/2000s_in_film</a> and extracts a list of "Movie Title (Year)". <hr> Update: Final Code: <pre><code>lis = [] for ul in uls: for li in ul.findAll('li'): if li.find('ul'): break lis.append(li) for li in lis: print li.text.encode("utf-8") </code></pre> The If-->break throws out the LI's that contain UL's since the nested LI's are now duplicated. Print output is now: <blockquote> <ul> <li>102 Dalmatians(2000)</li> <li>10th & Wolf(2006)</li> <li>11:14(2006)</li> <li>12:08 East of Bucharest(2006)</li> <li>13 Going on 30(2004)</li> <li>1408(2007)</li> <li>...</li> </ul> </blockquote> Thanks
Tags
<python><screen-scraping><beautifulsoup>
Title
BeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USdanneu
UserOwnerUserId
1. USdanneu
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POBeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POBeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 POBeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COYou're asking the wrong question. You've already done what you have in your question title and are asking how to fill a table/object/something. Please update your question to reflect that (and indicate what you mean by a table - database table or dictionary or something else or you don't know).
 singulars
 PostPostId
 POBeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?
 UserUserId
 USChris Morgan
2. COI didn't mean to obfuscate my question with that last sentence, so I'll clarify it. Now I have a list of <ul>s with child <li>s that may or may not contain a nested/child <ul> (with more <li>s). I'm unsure of how to extract all the lis. I'll change the title to better reflect the nested UL question.
 singulars
 PostPostId
 POBeautifulSoup: How do I extract all the <li>s from a list of <ul>s that contains some nested <ul>s?
 UserUserId
 USdanneu

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.