Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><strong>Update:</strong> It looks like the fields are actually tab separated, not space. If that is guaranteed, just split on <code>\t</code>.</p> <p>First, let's see why <code>(".*?"|\S+)</code> "does not work". Specifically, look at <code>".*?"</code> That means zero or more characters enclosed in double-quotes. Well, the field that is giving you problems is <code>""C:\Program Files\ABC\ABC XYZ""</code>. Note that each <code>""</code> at the beginning and end of that field will match <code>".*?"</code> because <code>""</code> consists of zero characters surrounded with double quotes.</p> <p>It is better to match as specifically as possible rather than splitting. So, if you have a configuration file with directives and a fixed format, form a regular expression match that is as close to the format you are trying to match as possible.</p> <p>Move the quotation marks outside of the capturing parentheses if you don't want them.</p> <pre><code>#!/usr/bin/perl use strict; use warnings; my $s = q{StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30}; my @parts = $s =~ m{\A(\w+) ([0-9]) (""[^"]+"") (\w+) ([0-9]) ([0-9]{2})}; use Data::Dumper; print Dumper \@parts; </code></pre> <p>Output:</p> <pre><code>$VAR1 = [ 'StartProgram', '1', '""C:\\Program Files\\ABC\\ABC XYZ""', 'CleanProgramTimeout', '1', '30' ]; </code></pre> <p>In that vein, here is a more involved script:</p> <pre><code>#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @strings = split /\n/, &lt;&lt;'EO_TEXT'; StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30 StartProgram 1 c:\opt\perl CleanProgramTimeout 1 30 EO_TEXT my $re = qr{ (?&lt;directive&gt;StartProgram)\s+ (?&lt;instance&gt;[0-9][0-9]?)\s+ (?&lt;path&gt;"".+?""|\S+)\s+ (?&lt;timeout_directive&gt;CleanProgramTimeout)\s+ (?&lt;timeout_instance&gt;[0-9][0-9]?)\s+(?&lt;timeout_seconds&gt;[0-9]{2}) }x; for (@strings) { if ( $_ =~ $re ) { print Dumper \%+; } } </code></pre> <p>Output:</p> <pre><code>$VAR1 = { 'timeout_directive' =&gt; 'CleanProgramTimeout', 'timeout_seconds' =&gt; '30', 'path' =&gt; '""C:\\Program Files\\ABC\\ABC XYZ""', 'directive' =&gt; 'StartProgram', 'timeout_instance' =&gt; '1', 'instance' =&gt; '1' }; $VAR1 = { 'timeout_directive' =&gt; 'CleanProgramTimeout', 'timeout_seconds' =&gt; '30', 'path' =&gt; 'c:\\opt\\perl', 'directive' =&gt; 'StartProgram', 'timeout_instance' =&gt; '1', 'instance' =&gt; '1' }; </code></pre> <p><strong>Update:</strong> I cannot get <code>Text::Balanced</code> or <code>Text::ParseWords</code> to parse this correctly. I suspect the problem is the repeated quotation marks that delineate the substring that should not be split. The following code is my best (not very good) attempt at solving the generic problem by using split and then selective re-gathering of parts of the string.</p> <pre><code>#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $s = q{StartProgram 1 ""C:\Program Files\ABC\ABC XYZ"" CleanProgramTimeout 1 30}; my $t = q{StartProgram 1 c:\opt\perl CleanProgramTimeout 1 30}; print Dumper parse_line($s); print Dumper parse_line($t); sub parse_line { my ($line) = @_; my @parts = split /(\s+)/, $line; my @real_parts; for (my $i = 0; $i &lt; @parts; $i += 1) { unless ( $parts[$i] =~ /^""/ ) { push @real_parts, $parts[$i] if $parts[$i] =~ /\S/; next; } my $part; do { $part .= $parts[$i++]; } until ($part =~ /""$/); push @real_parts, $part; } return \@real_parts; } </code></pre>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload