StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython iterate through section using lxml
primarykey
Id
14654417
data
AcceptedAnswerId
14654489
AnswerCount
2
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2013-02-01T20:26:32.867
FavoriteCount
0
LastActivityDate
2013-02-02T13:57:51.673
LastEditDate
2013-02-02T02:10:57.863
LastEditorUserId
1995132
OwnerUserId
1995132
ParentId
0
PostTypeId
1
Score
1
ViewCount
4522
LastEditorDisplayName
text
Body
I have a webpage that I am currently parsing using BeautifulSoup but it is quite slow so I have decided to try lxml as I read it is very fast. Anyway, I am struggling to get my code to iterate over the section I want, not sure how to use lxml and I can't find clear documentation on it. Anyway, here is my code: <pre><code>import urllib, urllib2 from lxml import etree def wgetUrl(target): try: req = urllib2.Request(target) req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3 Gecko/2008092417 Firefox/3.0.3') response = urllib2.urlopen(req) outtxt = response.read() response.close() except: return '' return outtxt newUrl = 'http://www.tv3.ie/3player' data = wgetUrl(newUrl) parser = etree.HTMLParser() tree = etree.fromstring(data, parser) for elem in tree.iter("div"): print elem.tag, elem.attrib, elem.text </code></pre> This returns all the DIV's but how do I specify to only iterate through dev id='slider1'? <pre><code>div {'style': 'position: relative;', 'id': 'slider1'} None </code></pre> This does not work: <pre><code>for elem in tree.iter("slider1"): </code></pre> I know this is probably a dumb question but I can't figure it out.. Thanks! * EDIT ** With your help adding this code I now have the output below: <pre><code>for elem in tree.xpath("//div[@id='slider1']//div[@id='gridshow']"): print elem[0].tag, elem[0].attrib, elem[0].text print elem[1].tag, elem[1].attrib, elem[1].text print elem[2].tag, elem[2].attrib, elem[2].text print elem[3].tag, elem[3].attrib, elem[3].text print elem[4].tag, elem[4].attrib, elem[4].text </code></pre> Output: <pre><code>a {'href': '/3player/show/392/57922/1/Tallafornia', 'title': '3player | Tallafornia, 11/01/2013. The Tallafornia crew are back, living in a beachside villa in Santa Ponsa, Majorca. As the crew settle in, the egos grow bigger than ever and cause tension'} None h3 {} None span {'id': 'gridcaption'} The Tallafornia crew are back, living in a beachside vill... span {'id': 'griddate'} 11/01/2013 span {'id': 'gridduration'} 00:27:52 </code></pre> That is all brilliant but I am missing a part of the a tag above. Would the parser be not handling the code correctly? I'm not getting the following: <pre><code><img alt="3player | Tallafornia, 11/01/2013. The Tallafornia crew are back, living in a beachside villa in Santa Ponsa, Majorca. As the crew settle in, the egos grow bigger than ever and cause tension" src='http://content.tv3.ie/content/videos/0378/tallaforniaep2_fri11jan2013_3player_1_57922_180x102.jpg' class='shadow smallroundcorner'></img> </code></pre> Any ideas why It doesn't pull this? Thanks again, very helpful posts..
Tags
<python><parsing><iteration><lxml>
Title
Python iterate through section using lxml
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USmcquaim
UserOwnerUserId
1. USmcquaim
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPython iterate through section using lxml
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.