StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
1274074
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2009-08-13T19:26:19.293
FavoriteCount
0
LastActivityDate
2009-08-13T20:47:13.530
LastEditDate
2009-08-13T20:47:13.530
LastEditorUserId
138475
OwnerUserId
138475
ParentId
1274020
PostTypeId
2
Score
7
ViewCount
0
LastEditorDisplayName
text
Body
Using regex to parse HTML is probably not the best way to go. You might take a look at <a href="http://php.net/manual/en/domdocument.loadhtml.php" rel="nofollow noreferrer">DOMDocument::loadHTML</a>, which will allow you to work with an HTML document using DOM methods (and XPath queries, for instance, if you know those). You might also want to take a look at <a href="http://framework.zend.com/manual/en/zend.dom.html" rel="nofollow noreferrer"><code>Zend_Dom</code></a> and <a href="http://framework.zend.com/manual/en/zend.dom.query.html" rel="nofollow noreferrer"><code>Zend_Dom_Query</code></a>, btw, which are quite nice if you can use some parts of Zend Framework in your application. They are used to get fetch data from HTML pages when doing functionnal testing with <a href="http://framework.zend.com/manual/en/zend.test.html" rel="nofollow noreferrer"><code>Zend_Test</code></a>, for instance -- and work quite well ;-) It may seem harder in the first place... But, considering the mess some HTML pages are, it is probably a much wiser idea... <hr> EDIT after the comment and the edit of the OP Here are a couple of thought about, to begin by something "simple", an input tag : <ul> <li>it can spread accross several lines</li> <li>it can have many attributes</li> <li>condirering only name and value are of interest to you, you have to deal with the fact that those two can be in any possible order</li> <li>attributes can have double-quotes, single-quotes, or even nothing arround their values</li> <li>tags / attributes can be both lower-case or upper-case</li> <li>tags don't always have to be closed</li> </ul> Well, some of those points are not valid-HTML ; but still work in the most commons web-browsers, so they have to be taken into account... Only with those points, I wouldn't like to be the one writting the regex ^^ But I suppose there might be others difficulties I didn't think about. On the other side, you have DOM and xpath... To get the value of an input name="q" (example is <a href="http://www.google.fr/search?q=test&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a" rel="nofollow noreferrer">this page</a>), it's a matter of something like this : <pre><code>$url = 'http://www.google.fr/search?q=test&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a'; $html = file_get_contents($url); $dom = new DOMDocument(); if (@$dom->loadHTML($html)) { // yep, not necessarily valid-html... $xpath = new DOMXpath($dom); $nodeList = $xpath->query('//input[@name="q"]'); if ($nodeList->length > 0) { for ($i=0 ; $i<$nodeList->length ; $i++) { $node = $nodeList->item($i); var_dump($node->getAttribute('value')); } } } else { // too bad... } </code></pre> What matters here ? The XPath query, and only that... And is there anything static/constant in it ? Well, I say I want all <code><input></code> that have a <code>name</code> attribute that is equal to "<code>q</code>". And it just works : I'm getting this result : <pre><code>string 'test' (length=4) string 'test' (length=4) </code></pre> (I checked : there are two input name="q" on the page ^^ ) Do I know the structure of the page ? Absolutly not ;-) I just know I/you/we want input tags named q ;-) And that's what we get ;-) <hr> EDIT 2 : and a bit fun with select and options : Well, just for fun, here's what I came up for select and option : <pre><code>$url = 'http://www.google.fr/language_tools?hl=fr'; $html = file_get_contents($url); $dom = new DOMDocument(); if (@$dom->loadHTML($html)) { // yep, not necessarily valid-html... $xpath = new DOMXpath($dom); $nodeListSelects = $xpath->query('//select'); if ($nodeListSelects->length > 0) { for ($i=0 ; $i<$nodeListSelects->length ; $i++) { $nodeSelect = $nodeListSelects->item($i); $name = $nodeSelect->getAttribute('name'); $nodeListOptions = $xpath->query('option[@selected="selected"]', $nodeSelect); // We want options that are inside the current select if ($nodeListOptions->length > 0) { for ($j=0 ; $j<$nodeListOptions->length ; $j++) { $nodeOption = $nodeListOptions->item($j); $value = $nodeOption->getAttribute('value'); var_dump("name='$name' => value='$value'"); } } } } } else { // too bad... } </code></pre> And I get as an output : <pre><code>string 'name='sl' => value='fr'' (length=23) string 'name='tl' => value='en'' (length=23) string 'name='sl' => value='en'' (length=23) string 'name='tl' => value='fr'' (length=23) string 'name='sl' => value='en'' (length=23) string 'name='tl' => value='fr'' (length=23) </code></pre> Which is what I expected. Some explanations ? Well, first of all, I get all the select tags of the page, and keep their name in memory. Then, for each one of those, I get the selected option tags that are its descendants (there's always only one, btw). And here, I have the value. A bit more complicated that the previous example... But still much more easy than regex, I believe... Took me maybe 10 minutes, not more... And I still won't have the courage (madness ?) to start thinkg about some kind of mutant regex that would be able to do that :-D Oh, and, as a sidenote : I still have no idea what the structure of the HTML document looks like : I have not even taken a single look at it's source ^^ I hope this helps a bit more... Who knows, maybe I'll convince you regex are not a good idea when it comes to parsing HTML... maybe ? ;-) Still : have fun !
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POExtract form fields using RegEx
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USPascal MARTIN
UserOwnerUserId
1. USPascal MARTIN
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POExtract form fields using RegEx
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COThanks, but please read my edit above.
 singulars
 PostPostId
 PO
 UserUserId
 USAlix Axel
2. COI've edited my answer a couple of times, to give a couple of examples, using XPath, for input and select+option tags. Hope this helps :-) -- sorry, but I will definitly not try to write any regex to do that ; don't want to end up insane a few days before my holidays ^^
 singulars
 PostPostId
 PO
 UserUserId
 USPascal MARTIN

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.