StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POFind All URL That Is Not An HTML Attribute or Content of A Hyperlink Tag
primarykey
Id
20588499
data
AcceptedAnswerId
20648925
AnswerCount
3
ClosedDate
CommentCount
9
CommunityOwnedDate
CreationDate
2013-12-14T21:32:31.157
FavoriteCount
0
LastActivityDate
2013-12-18T19:34:17.943
LastEditDate
2013-12-18T01:58:32.207
LastEditorUserId
706394
OwnerUserId
706394
ParentId
0
PostTypeId
1
Score
-1
ViewCount
805
LastEditorDisplayName
text
Body
I'm trying to figure out a regex that matches all URL that are not an attribute of an element or is a content of a hyperlink. Should match: <pre><code> 1. This is a url http://www.google.com </code></pre> Should not match: <pre><code> 1. <a href="http://www.google.com">Google</a> 2. <a href="http://www.google.com">http://www.google.com</a> 3. <img src="http://www.google.com/image.jpg"> 4. <div data-url="http://www.google.com"></div> </code></pre> I'm currently using this regex to match all URL and I think I know what I have to detect, but I just can't figure out using regex. <pre><code>\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|] </code></pre> <hr> EDITED What I'm trying to achieve is the following. I want to convert this string. <pre><code>This is a url http://www.google.com <a href="http://www.google.com" title="Go to Google">Google</a><a href="http://www.google.com">http://www.google.com</a><img src="http://www.google.com/image.jpg"><div data-url="http://www.google.com"></div> </code></pre> To <pre><code>This is a url <a href="http://www.google.com">http://www.google.com</a> <a href="http://www.google.com" title="Go to Google">Google</a><a href="http://www.google.com">http://www.google.com</a><img src="http://www.google.com/image.jpg"><div data-url="http://www.google.com"></div> </code></pre> Preprocessing by removing tags and then put them back doesn't solve the problem since actually ends up removing all data attributes of the existing hyperlink elements. It also doesn't solve the issue when there are other URL using in other attributes beside href. So far, I haven't found a solution suggested by anyone and so far I also haven't found a way to do this using HTML parser. It's actually seem more doable using regex. <hr> EDITED 2 After the attempt based on Dean's suggestion, I'm about ready to rule out HTML parser from being able to achieve this for it inability to process string without making it a valid HTML document. Here's the code based on the suggested example + the fix to handle exclusion case 2. <pre><code> Document doc = Jsoup.parseBodyFragment(htmlText); final List<TextNode> nodesToChange = new ArrayList<TextNode>(); NodeTraversor nd = new NodeTraversor(new NodeVisitor() { @Override public void tail(Node node, int depth) { if (node instanceof TextNode) { TextNode textNode = (TextNode) node; Node parent = node.parent(); if(parent.nodeName().equals("a")){ return; } String text = textNode.getWholeText(); List<String> allMatches = new ArrayList<String>(); Matcher m = Pattern.compile("\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]") .matcher(text); while (m.find()) { allMatches.add(m.group()); } if(allMatches.size() > 0){ nodesToChange.add(textNode); } } } @Override public void head(Node node, int depth) { } }); nd.traverse(doc.body()); </code></pre> This code adds HTML, HEAD and BODY tag to the result. The only hack I can think of around this issue is to check whether HTML, HEAD and BODY tags exist in the string. If not, stripe them out after processing. I hope someone else has a better suggestion than this hack. Using JSOUP is already very expensive in terms of processing time so I really don't want to add more overhead if I don't have to.
Tags
<java><html><regex><url>
Title
Find All URL That Is Not An HTML Attribute or Content of A Hyperlink Tag
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USjuminoz
UserOwnerUserId
1. USjuminoz
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
2. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POFind All URL That Is Not An HTML Attribute or Content of A Hyperlink Tag
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTDownMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.