Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Contrary to all the answers here, for what you're trying to do regex is a perfectly valid solution. This is because you are NOT trying to match balanced tags-- THAT would be impossible with regex! But you are only matching what's in one tag, and that's perfectly regular.</p> <p>Here's the problem, though. You can't do it with just one regex... you need to do one match to capture an <code>&lt;input&gt;</code> tag, then do further processing on that. Note that this will only work if none of the attribute values have a <code>&gt;</code> character in them, so it's not perfect, but it should suffice for sane inputs.</p> <p>Here's some Perl (pseudo)code to show you what I mean:</p> <pre><code>my $html = readLargeInputFile(); my @input_tags = $html =~ m/ ( &lt;input # Starts with "&lt;input" (?=[^&gt;]*?type="hidden") # Use lookahead to make sure that type="hidden" [^&gt;]+ # Grab the rest of the tag... \/&gt; # ...except for the /&gt;, which is grabbed here )/xgm; # Now each member of @input_tags is something like &lt;input type="hidden" name="SaveRequired" value="False" /&gt; foreach my $input_tag (@input_tags) { my $hash_ref = {}; # Now extract each of the fields one at a time. ($hash_ref-&gt;{"name"}) = $input_tag =~ /name="([^"]*)"/; ($hash_ref-&gt;{"value"}) = $input_tag =~ /value="([^"]*)"/; # Put $hash_ref in a list or something, or otherwise process it } </code></pre> <p>The basic principle here is, don't try to do too much with one regular expression. As you noticed, regular expressions enforce a certain amount of order. So what you need to do instead is to first match the CONTEXT of what you're trying to extract, then do submatching on the data you want.</p> <p><b>EDIT:</b> However, I will agree that in general, using an HTML parser is probably easier and better and you really should consider redesigning your code or re-examining your objectives. :-) But I had to post this answer as a counter to the knee-jerk reaction that parsing any subset of HTML is impossible: HTML and XML are both irregular when you consider the entire specification, but the specification of a tag is decently regular, certainly within the power of PCRE.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload