Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>There are three issues to address from your question:</p> <ul> <li><p>Technical question of how to strip whitespace INCLUDING assorted newlines from a string</p></li> <li><p>A general question of how to process the file format described. I will present a different solution which works if the file size is small enough that you can slurp the whole file into a string in memory.</p></li> <li><p>Reading in the file in chunks (e.g. line-by-line), to avoid slurping the whole file into memory.</p></li> </ul> <hr> <ol> <li><p><strong>To strip from a non-title line</strong> both the whitespace and assorted newlines (e.g. your <code>_chomp_</code>) line, you can do:</p> <pre><code>$lines =~ s/[\n\r]|\s//gs; # IIRC, \s doesn't include newlines </code></pre></li> <li><p>In addition, <strong>if your file is small enough that slurping it all into memory as a single long string in is an option</strong>, you can (at the cost of slightly slower code), have a shorter, hopefully more readable logic instead of the logic in your sample code:</p> <pre><code>my @lines = split(/(\015|\012|\015\012)&gt;/, $text); # Split on "&gt;" first line char foreach my $line (@lines) { my ($title, $rest) = ($line =~ /^(&gt;[^\n\r]+)[\n\r](.*)$/s); $rest =~ s/[\n\r]|\s//gs; # Strip whitespace AND newlines. print New_File "$title\n$rest\n"; } </code></pre></li> <li><p>However, if the data is large enough that you <strong>must</strong> read it in chunks (in case of text, the chunk is usually one line), you have a problem, with BOTH your proposed code and the code I showed above. </p> <p>Perl's standard line-by-line reading via <code>&lt;&gt;</code> operator (or <a href="http://perldoc.perl.org/functions/readline.html" rel="nofollow">readline</a>) will use input record separator (<code>$/</code>) to define what is a newline, which is "\n" by default. If your file is all "\r" separated, it will be treated as a giant single line, meaning you <strong>will</strong> slurp the file in whether you like it or not. Obviously, changing <code>$/</code> to "\r" won't help.</p> <p>Unfortunately, <code>$/</code> (input record separator) must be a string and can not be a regular expression.</p> <p>Therefore, if you absolutely MUST read the file with arbitrary newlines in chunks due to size consideration, <strong>you need to read file in fixed block sizes instead of line by line</strong>, and then parse out individual lines from those blocks.</p> <p>To do such reading, IIRC, you can set <code>$/</code> to an integer and then use <code>readline() / &lt;&gt;</code>.</p> <p>Please note that the module mentioned by cjm's answer (PerlIO::eol) does exactly the latter approach, but it is implemented as an XS module and thus does it in C code (<code>PerlIOEOL_get_base()</code> function has buffer size 4k).</p></li> </ol>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload