Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Against my better judgment I will try to help you again.</p> <p>The issue is not how to find a blank line. The issue is not which regex to use. The fundamental issue is understanding how to analyze a problem and turn that analysis into code.</p> <p>In this case the problem is "How do I parse this format?"</p> <p>I've written a parser for you. I have also taken the time to write a detailed description of the process I used to write it.</p> <p>WARNING: The parser is not carefully tested for all cases. It does not have enough error handling built in. For those features, you can request a rate card or write them yourself.</p> <p>Here's the data sample you provided (I'm not sure which of your several questions I pulled this from):</p> <pre><code>constant fixup GemEstabCommDelay = &lt;U2 20&gt; vid = 6 name = "ESTABLISHCOMMUNICATIONSTIMEOUT" units = "s" min = &lt;U2 0&gt; max = &lt;U2 1800&gt; default = &lt;U2 20&gt; constant fixup private GemConstantFileName = &lt;A "C:\\TMP\\CONST.LOG"&gt; vid = 4 name = "" units = "" constant fixup private GemAlarmFileName = &lt;A "C:\\TMP\\ALARM.LOG"&gt; vid = 0 name = "" units = "" </code></pre> <p>Before you can write a parser for a data file, you need to have a description the structure of the file. If you are using a standard format (say XML) you can read the existing specification. If you are using some home-grown format, you get to write it yourself.</p> <p>So, based on the sample data, we can see that:</p> <ol> <li>data is broken into blocks.</li> <li>each block starts with the word <code>constant</code> in column 0.</li> <li>each block ends with a blank line.</li> <li>a block consists of a start line, and zero or more additional lines.</li> <li>The start line consists of the keyword <code>constant</code> followed by one or more whitespace delimited words, an '=' sign and an <code>&lt;&gt;</code> quoted data value. <ul> <li>The last keyword appears to be the name of the constant. Call it <code>constant_name</code></li> <li>The <code>&lt;&gt;</code>-quoted data appears to be a combined type/value specifier.</li> <li>earlier keywords appear to specify additional metadata about the constant. Let's call those <code>options</code>.</li> </ul></li> <li>The additional lines specify additional key value pairs. Let's call them attributes. Attributes may have a single value or they may have a type/value specifier.</li> <li>One or more attributes may appear in a single line.</li> </ol> <p>Okay, so now we have a rough spec. What do we do with it? </p> <p>How is the format structured? Consider the logical units of organization from largest to smallest. These will determine the structure and flow of our code.</p> <ul> <li>A FILE is made of BLOCKS.</li> <li>BLOCKS are made of LINES.</li> </ul> <p>So our parser should decompose a file into blocks, and then handle the blocks.</p> <p>Now we rough out a parser in comments:</p> <pre><code># Parse a constant spec file. # Until file is done: # Read in a whole block # Parse the block and return key/value pairs for a hash. # Store a ref to the hash in a big hash of all blocks, keyed by constant_name. # Return ref to big hash with all block data </code></pre> <p>Now we start to fill in some code:</p> <pre><code># Parse a constant spec file. sub parse_constant_spec { my $fh = shift; my %spec; # Until file is done: # Read in a whole block while( my $block = read_block($fh) ) { # Parse the and return key/value pairs for a hash. my %constant = parse_block( $block ); # Store a ref to the hash in a big hash of all blocks, keyed by constant_name. $spec{ $constant{name} } = \%constant; } # Return ref to big hash with all block data return \%spec; } </code></pre> <p>But it won't work. The <code>parse_block</code> and <code>read_block</code> subs haven't been written yet. At this stage that's OK. The point is to rough in features in small, understandable chunks. Every once in a while, to keep things readable you need to gloss over the details drop in a subroutine--otherwise you wind up with monstrous 1000 line subs that are impossible to debug.</p> <p>Now we know we need to write a couple of subs to finish up, et viola:</p> <pre><code>#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $fh = \*DATA; print Dumper parse_constant_spec( $fh ); # Parse a constant spec file. # Pass in a handle to process. # As long as it acts like a file handle, it will work. sub parse_constant_spec { my $fh = shift; my %spec; # Until file is done: # Read in a whole block while( my $block = read_block($fh) ) { # Parse the and return key/value pairs for a hash. my %constant = parse_block( $block ); # Store a ref to the hash in a big hash of all blocks, keyed by constant_name. $spec{ $constant{const_name} } = \%constant; } # Return ref to big hash with all block data return \%spec; } # Read a constant definition block from a file handle. # void return when there is no data left in the file. # Otherwise return an array ref containing lines to in the block. sub read_block { my $fh = shift; my @lines; my $block_started = 0; while( my $line = &lt;$fh&gt; ) { $block_started++ if $line =~ /^constant/; if( $block_started ) { last if $line =~ /^\s*$/; push @lines, $line; } } return \@lines if @lines; return; } sub parse_block { my $block = shift; my ($start_line, @attribs) = @$block; my %constant; # Break down first line: # First separate assignment from option list. my ($start_head, $start_tail) = split /=/, $start_line; # work on option list my @options = split /\s+/, $start_head; # Recover constant_name from options: $constant{const_name} = pop @options; $constant{options} = \@options; # Now we parse the value/type specifier @constant{'type', 'value' } = parse_type_value_specifier( $start_tail ); # Parse attribute lines. # since we've already got multiple per line, get them all at once. chomp @attribs; my $attribs = join ' ', @attribs; # we have one long line of mixed key = "value" or key = &lt;TYPE VALUE&gt; @attribs = $attribs =~ /\s*(\w+\s+=\s+".*?"|\w+\s+=\s+&lt;.*?&gt;)\s*/g; for my $attrib ( @attribs ) { warn "$attrib\n"; my ($name, $value) = split /\s*=\s*/, $attrib; if( $value =~ /^"/ ) { $value =~ s/^"|"\s*$//g; } elsif( $value =~ /^&lt;/ ) { $value = [ parse_type_value_specifier( $start_tail ) ]; } else { warn "Bad line"; } $constant{ $name } = $value; } return %constant; } sub parse_type_value_specifier { my $tvs = shift; my ($type, $value) = $tvs =~ /&lt;(\w+)\s+(.*?)&gt;/; return $type, $value; } __DATA__ constant fixup GemEstabCommDelay = &lt;U2 20&gt; vid = 6 name = "ESTABLISHCOMMUNICATIONSTIMEOUT" units = "s" min = &lt;U2 0&gt; max = &lt;U2 1800&gt; default = &lt;U2 20&gt; constant fixup private GemConstantFileName = &lt;A "C:\\TMP\\CONST.LOG"&gt; vid = 4 name = "" units = "" constant fixup private GemAlarmFileName = &lt;A "C:\\TMP\\ALARM.LOG"&gt; vid = 0 name = "" units = "" </code></pre> <p>The above code is far from perfect. IMO, <code>parse_block</code> is too long and ought to be broken into smaller subs. Also, there isn't nearly enough validation and enforcement of well-formed input. Variable names and descriptions could be clearer, but I don't really understand the semantics of your data format. Better names would more closely match the semantics of the data format.</p> <p>Despite these issues, it does parse your format and produce a big handy data structure that can be stuffed into whatever output format you want.</p> <p>If you use this format in many places, I recommend putting the parsing code into a module. See <a href="http://perldoc.perl.org/perlmod.html" rel="noreferrer">perldoc perlmod</a> for more info.</p> <p>Now, please stop using global variables and ignoring good advice. Please start reading the perldoc, read Learning Perl and Perl Best Practices, use strict, use warnings. While I am throwing reading lists around go read <a href="http://c2.com/cgi/wiki?GlobalVariablesAreBad" rel="noreferrer">Global Variables are Bad</a> and then wander around the wiki to read and learn. I learned more about writing software by reading c2 than I did in school.</p> <p>If you have questions about how this code works, why it is laid out as it is, what other choices could have been made, speak up and ask. I am willing to help a willing student.</p> <p>Your English is good, but it is clear you are not a native speaker. I may have used too many complex sentences. If you need parts of this written in simple sentences, I can try to help. I understand that working in a foreign language is very difficult.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload