StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
20590738
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
4
CommunityOwnedDate
CreationDate
2013-12-15T03:00:57.570
FavoriteCount
0
LastActivityDate
2013-12-18T19:34:17.943
LastEditDate
2017-05-23T10:32:18.890
LastEditorUserId
-1
OwnerUserId
406712
ParentId
20588499
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
<h1>Expecting Valid HTML Output</h1> <p>Here is rough guide to get you started.</p> <ol> <li>Use a HTML5 parsing engine like <a href="http://jsoup.org/" rel="nofollow noreferrer">jsoup Java HTML Parser</a> <ul> <li>HTML5 specification deals with invalid HTML in a known specified way for predicable results.</li> <li>this parsing engine actually provides HTML modification methods too.</li> </ul></li> <li><p>Parse your HTML something like this:</p> <pre><code>String html = "This is a url http://www.google.com <a href=\"http://www.google.com\" title=\"Go to Google\">Google</a>"; Document doc = Jsoup.parseBodyFragment(html); Element body = doc.body(); </code></pre></li> <li>Find all your <strong>text nodes</strong> (non-HTML element bits) <ul> <li>You can find an example of an <a href="https://stackoverflow.com/a/6594828/406712">jsoup text iterator in this answer</a>.</li> </ul></li> <li>Test to see if the <strong>text</strong> looks like a link (use your regex)</li> <li>Replace the text as indicated in the <a href="https://stackoverflow.com/a/6594828/406712">same example</a>.</li> <li>Obtain the HTML of the complete <strong>modified</strong> document.</li> <li>Sit back and enjoy.</li> </ol> <h1>Edit 1 - The Crazy World of replacing in Invalid HTML</h1> <p>It seems the author of this question has indicated that the content is <strong>not</strong> valid HTML and requires the <strong>invalid HTML</strong> to be maintained - as such a HTML parser shouldn't be used as any HTML parser would likely output valid HTML when saving.</p> <p>As indicated in my comment to the original question you can use negative look behinds in regex. But only a fool would parse HTML with RegEx - apparently we aren't so here is one possible example.</p> <p><strong>I wouldn't use this in production code - but it answers OP's question</strong></p> <h1>The RegEx</h1> <p>Unfortunately Java doesn't support unlimited look-behinds so I have included the following limits:</p> <ul> <li>Tag name - max of 255 characters</li> <li>Spaces - max of 30 characters</li> <li>Attribute contents (including attributes and values) - max of 4098 characters</li> </ul> <h1>Negative Look-behind</h1> <p><img src="https://www.debuggex.com/i/3K6yUtvMaXs0HQOV.png" alt="Regular expression visualization"> Note that this visualization is incorrect as <code>[\p{L}0-9_.-]</code> was replaced with <code>[A-Z0-9_.-]</code> to get visualisation to work - but <code>\p{L}</code> is technically more correct as "Any Unicode Letter" is possible.</p> <h1>Complete Regex</h1> <pre class="lang-regex prettyprint-override"><code># Negative look-behind (?<! ## N1: Looks like an HTML attribute value inside a HTML tag ### N1: Tag name <[A-Z0-9]{1,255} ### N1: Any HTML attributes and values (?:\s{1,30}[^<>]{0,4098})? ### N1: The begining of a HTML attribute with value \s{1,30} [\p{L}0-9_.-]{1,255} \s{0,30}=\s{0,30} ### N1: Optional HTML attribute quotes ["']? | ## N2: Looks like the start of an HTML tag text content ### N2: Tag name <[A-Z0-9]{1,255}\s{1,30} ### N2: All HTML attributes and values [^<>]{0,4098} ### N2: End of HTML opening tag > ) ## Positive match: The URL value ((?:https?|ftp|file)://[-a-zA-Z0-9+&@\#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@\#/%=~_|]) </code></pre> <h1>The Java</h1> <pre class="lang-java prettyprint-override"><code>import java.util.*; import java.lang.*; import java.io.*; import java.util.regex.*; class CrazyInvalidHtmlUrlTextFindAndReplacer { public static final String EXAMPLE_TEST = "This is a url http://www.google.com <a href=\"http://www.google.com\" title=\"Go to Google\">Google</a><a href=\"http://www.google.com\">http://www.google.com</a><img src=\"http://www.google.com/image.jpg\"><div data-url=\"http://www.google.com\"></div>"; public static final String EXPECTED_OUTPUT_TEST = "This is a url <a href=\"http://www.google.com\">http://www.google.com</a> <a href=\"http://www.google.com\" title=\"Go to Google\">Google</a><a href=\"http://www.google.com\">http://www.google.com</a><img src=\"http://www.google.com/image.jpg\"><div data-url=\"http://www.google.com\"></div>"; public static void main (String[] args) throws java.lang.Exception { System.out.println("Starting our non-HTML search and replace..."); StringBuffer resultString = new StringBuffer(); String subjectString = new String(EXAMPLE_TEST); System.out.println(subjectString); try { Pattern regex = Pattern.compile( "# Negative lookbehind\n" + "(?<!\n" + "## N1: Looks like an HTML attribute value inside a HTML tag\n" + "### N1: Tag name\n" + "<[A-Z0-9]{1,255}\n" + "### N1: Any HTML attributes and values\n" + "(?:\\s{1,30}[^<>]{0,4098})?\n" + "### N1: The begining of a HTML attribute with value\n" + "\\s{1,30}\n" + "[\\p{L}0-9_.-]{1,255}\n" + "\\s{0,30}=\\s{0,30}\n" + "### N1: Optional HTML attribute quotes\n" + "[\"']?\n" + "|\n" + "## N2: Looks like the start of an HTML tag text content\n" + "### N2: Tag name\n" + "<[A-Z0-9]{1,255}\\s{1,30}\n" + "### N2: All HTML attributes and values\n" + "[^<>]{0,4098}\n" + "### N2: End of HTML opening tag\n" + ">\n" + ")\n" + "## Positive match: The URL value\n" + "((?:https?|ftp|file)://[-a-zA-Z0-9+&@\\#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@\\#/%=~_|])", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.COMMENTS); Matcher regexMatcher = regex.matcher(subjectString); while (regexMatcher.find()) { System.out.println("text"); try { // You can vary the replacement text for each match on-the-fly // !!!!!!!!! // @todo Escape the attribute values and content text. // !!!!!!!!! regexMatcher.appendReplacement(resultString, "<a href=\"$1\">$1</a>"); } catch (IllegalStateException ex) { // appendReplacement() called without a prior successful call to find() System.out.println("IllegalStateException"); } catch (IllegalArgumentException ex) { // Syntax error in the replacement text (unescaped $ signs?) System.out.println("IllegalArgumentException"); } catch (IndexOutOfBoundsException ex) { // Non-existent backreference used the replacement text System.out.println("IndexOutOfBoundsException"); } } regexMatcher.appendTail(resultString); } catch (PatternSyntaxException ex) { // Syntax error in the regular expression System.out.println("PatternSyntaxException"); System.out.println(ex.toString()); } System.out.println("result:"); System.out.println(resultString.toString()); if (resultString.toString().equals(EXPECTED_OUTPUT_TEST)) { System.out.println("success!!!!"); } else { System.out.println("failure - expected:"); System.out.println(EXPECTED_OUTPUT_TEST); } } } </code></pre> <p>No idea what the performance would be like on this - look-behinds are <strong>expensive</strong> - that's on top of RegEx generally being expensive too.</p>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POFind All URL That Is Not An HTML Attribute or Content of A Hyperlink Tag
  singulars
  PostTypePostTypeId
  PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USDean Taylor
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
  singulars
  PostPostId
  PO
  UserUserId
  This table or related slice is empty.
  VoteTypeVoteTypeId
  VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.