Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>As was mentioned, you need some structure in your regex. In refatoring your code, I made a couple assumptions</p> <ul> <li>You don't want to just print it out in a tabbed delimited format</li> <li>The only reason for the <code>$x</code> variable is so that you only print one line. (although, a <code>last</code> at the end of the loop would have worked just fine.).</li> </ul> <p>Having assumed these things, I decided that, in addressing your question, I would:</p> <ol> <li>Show you how to make a good <em>modifiable</em> regex.</li> <li>Code very simple "semantic actions" which store the data and let you use it as you please.</li> </ol> <p>In addition is should be noted that I changed input to a <code>__DATA__</code> section and output is restricted to STDERR--through the use of <code>Smart::Comment</code> comments, that hep me inspect my structures.</p> <p>First the code preamble. </p> <pre><code>use strict; # always in development! use warnings; # always in development! use English qw&lt;$LIST_SEPARATOR&gt;; # It's just helpful. #use re 'debug'; #use Smart::Comments </code></pre> <p>Note the commented-out <code>use re</code>.... If you really want to see the way a regular expression gets parsed, it will put out a lot of information that you probably don't want to see (but can make your way through--with a little knowledge about regex parsing, nonetheless.) It's commented out because it is just not newbie friendly, and will monopolize your output. (For more about that see <a href="http://search.cpan.org/perldoc?re" rel="nofollow noreferrer">re</a>.)</p> <p>Also commented out is the <code>use Smart::Comments</code> line. I recommend it, but you can get by using <code>Data::Dumper</code> and <code>print Dumper( \%hash )</code> lines. (See <a href="http://search.cpan.org/perldoc?Smart::Comments" rel="nofollow noreferrer"><code>Smart::Comments</code></a>.)</p> <h3>Specifying the Expression</h3> <p>But on to the regex. I used an exploded form of regex so that the parts of the whole are explained (see <a href="http://search.cpan.org/perldoc?perlre" rel="nofollow noreferrer">perlre</a>). We want a single alphanumeric character OR a quoted string (with allowed escapes). </p> <p>We also used a list of modifier names, so that the "language" can progress.</p> <p>The next regex we make in a "do block" or as I like to call it a "localization block", so that I can localize <code>$LIST_SEPARATOR</code> (aka <code>$"</code>) to be the regex alternation character. ('|'). Thus when I include the list to be interpolated, it is interpolated as an alternation. </p> <p>I'll give you time to look at the second regex before talking about it.</p> <pre><code># Modifiable list of modifiers my @mod_names = qw&lt;constant fixup private&gt;; # Break out the more complex chunks into separate expressions my $arg2_regex = qr{ \p{IsAlnum} # accept a single alphanumeric character | # OR " # Starts with a double quote (?&gt; # -&gt; We just want to group, not capture # the '?&gt; controls back tracing [^\\"\P{IsPrint}]+ # any print character as long as it is not # a backslash or a double quote | \\" # but we will accept a backslash followed by # a double quote | (\\\\)+ # OR any amount of doubled backslashes )* # any number of these " }msx; my $line_RE = do { local $LIST_SEPARATOR = '|'; qr{ \A # the beginning \s* # however much whitespace you need # A sequence of modifier names followed by space ((?: (?: @mod_names ) \s+ )*) ( \p{IsAlnum}+ ) # at least one alphanumeric character \s* # any amount of whitespace = # an equals sign \s* # any amount of whitespace &lt; # open angle bracket (\p{IsAlnum}+) # Alphanumeric identifier \s+ # required whitespace ( $arg2_regex ) # previously specified arg #2 expression [^&gt;]*? &gt; # close angle bracket }msx ; }; </code></pre> <p>The regex just says that we want any number of recognized "modifiers" separated by whitespace followed by an alphanumeric idenfier (I'm not sure why you don't want underscores; I don't include them, regardless.) </p> <p>That is followed by any amount of whitespace and an equals sign. Since the sets of alphanumeric characters, whitespace, and the equals sign are all disjoint, there is no reason to require whitespace. On the other side of the equals sign, the value is delimited by angle brackets, so I don't see any reason to <em>require</em> whitespace on that side either. Before the equals all you've allowed is alphanumerics and whitespace and on the other side, it all has to be in angle brackets. Required whitespace gives you nothing, while not requiring it is more fault-tolerant. Ignore all that and change the <code>*</code>s to <code>+</code> if you are expecting a machine output. </p> <p>On the other side of the equals sign, we require an angle bracket pair. The pair consists of an alphanumeric argument, with the second argument being EITHER a single alphanumeric character (based on your spec) OR a string which can contain escaped escapes or quotes and even the end angle bracket--as long as the string doesn't end.</p> <h3>Storing the Data</h3> <p>Once the specification has been made, here's just one of the things you can do with it. Because I don't know what you wanted to do with this besides print it out--which I'm going to assume is not the whole purpose of the script. </p> <pre><code>### $line_RE my %fixup_map; while ( my $line = &lt;DATA&gt; ) { ### $line my ( $mod_text, $identifier, $first_arg, $second_arg ) = ( $line =~ /$line_RE/ ) ; die 'Did not parse!' unless $identifier; $fixup_map{$identifier} = { modifiers_for =&gt; { map { $_ =&gt; 1 } split /\s+/, $mod_text } , first_arg =&gt; $first_arg , second_arg =&gt; $second_arg }; ### $fixup_map{$identifier} : $fixup_map{$identifier} } __DATA__ constant fixup ConfigAlarms = &lt;U1 0&gt; constant fixup ConfigAlarms2 = &lt;U1 2&gt; constant fixup private AlarmFileName = &lt;A "C:\\TMP\\ALARM.LOG"&gt; </code></pre> <p>At the end you can see the <code>DATA</code> section, when you're at the beginning stage as you seem to be here, it's most convenient to dispense with IO logic and use the builtin handle <code>DATA</code> as I do here.</p> <p>I collect the modifiers in a hash, so that my semantic actions could be </p> <pre><code>#... my $data = $fixup_map{$id}; #... if ( $data-&gt;{modifiers_for}{public} ) { #... } </code></pre> <h3>Soap Box</h3> <p>The main problem however is that you don't seem to have a plan. For the second "argument" in the angle brakets, you have a regex that specifies <em>only</em> a single alphanumeric character, but want to expand it to allow escaped strings. I have to expect that you are implementing a small subset and gradually want expand it do do other things. If you neglect a good design from the beginning, it's only going to become more and more of a headache to implement the full-featured "parser". </p> <p>You may want to implement multi-line values at some point. If you don't understand how to get from a single alphanumeric to a quote-delimited argument, the line-by-line method and the adjustments to the regex dwarf that complexity gap. </p> <p>So I advise you to use the code here only as a <em>guideline</em> for expanding complexity. I'm answering a question and indicating a direction, not designing or coding a project, so my regex code isn't as expandable as it probably should be. </p> <p>If the parsing job was complex enough, I would specify a minimal lookahead grammar for <a href="http://search.cpan.org/perldoc?Parse::RecDescent" rel="nofollow noreferrer"><code>Parse::RecDescent</code></a>, and stick to coding the semantic actions. That's another recommendation. </p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload