Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy can't I use the map function to create a good hash from a simple data file in Perl?
    primarykey
    data
    text
    <p><strong>The post is updated. Please kindly jump to the Solution part, if you've already read the posted question. Thanks!</strong> </p> <p>Here's the minimized code to exhibit my problem:</p> <p>The input data file for test has been saved by Window's built-in Notepad as UTF-8 encoding. It has the following three lines:</p> <pre> abacus æbәkәs abalone æbәlәuni abandon әbændәn </pre> <p>The Perl script file has also been saved by Window's built-in Notepad as UTF-8 encoding. It contains the following code:</p> <pre><code>#!perl -w use Data::Dumper; use strict; use autodie; open my $in,'&lt;',"./hash_test.txt"; open my $out,'&gt;',"./hash_result.txt"; my %hash = map {split/\t/,$_,2} &lt;$in&gt;; print $out Dumper(\%hash),"\n"; print $out "$hash{abacus}"; print $out "$hash{abalone}"; print $out "$hash{abandon}"; </code></pre> <p>In the output, the hash table seems to be okay:</p> <pre> $VAR1 = { 'abalone' => 'æbәlәuni ', 'abandon' => 'әbændәn', 'abacus' => 'æbәkәs ' }; </pre> <p>But it is actually not, because I only get two values instead of three:</p> <pre> æbәlәuni әbændәn </pre> <p>Perl gives the following warning message:</p> <p><code>Use of uninitialized value $hash{"abacus"} in string at C:\test2.pl line 11, &lt;$i n&gt; line 3.</code></p> <p>where's the problem? Can someone kindly explain? Thanks.</p> <p><strong>The Solution</strong></p> <p>Millions of thanks to all of you guys :) Now finally the culprit is found and the problem becomes fixable :) As @Sinan insightfully pointed out, I'm now 100% sure that the culprit for causing the problem I described above is the two bytes of BOM, which Notepad added to my data file when it was saved as UTF-8 and which somehow Perl does not treat properly. Although many suggested that I should use "&lt;:utf8" and ">:utf8" to read and write files, the thing is these utf-8 configurations do not solve the problem. Instead they may cause some other problems.</p> <p>To really solve the problem, all I actually need is to add one line of code to force Perl to ignore the BOM:</p> <pre><code>#!perl -w use Data::Dumper; use strict; use autodie; open my $in,'&lt;',"./hash_test.txt"; open my $out,'&gt;',"./hash_result.txt"; seek $in,3,0; # force Perl to ignore the BOM! my %hash = map {split/\t/,$_,2} &lt;$in&gt;; print $out Dumper(\%hash); print $out $hash{abacus}; print $out $hash{abalone}; print $out $hash{abandon}; </code></pre> <p>Now, the output is exactly what I expected:</p> <pre> $VAR1 = { 'abalone' => 'æbәlәuni ', 'abandon' => 'әbændәn', 'abacus' => 'æbәkәs ' }; æbәkәs æbәlәuni әbændәn </pre> <p>Please note the script is saved as UTF-8 encoding and the code does not have to include any utf-8 labels because the input file and the output file are both pre-saved as UTF-8 encoding.</p> <p>Finally thanks again to all of you. And thank you, @Sinan, for the insightful guidance. Without your help, I would stay in the dark for God know how long.</p> <p><strong>Note</strong> To clarify a little more, if I use:</p> <pre><code>open my $in,'&lt;:utf8',"./hash_test.txt"; open my $out,'&gt;:utf8',"./hash_result.txt"; my %hash = map {split/\t/,$_,2} &lt;$in&gt;; print $out Dumper(\%hash); print $out $hash{abacus}; print $out $hash{abalone}; print $out $hash{abandon}; </code></pre> <p>The output is this:</p> <pre> $VAR1 = { 'abalone' => "\x{e6}b\x{4d9}l\x{4d9}uni ", 'abandon' => "\x{4d9}b\x{e6}nd\x{4d9}n", "\x{feff}abacus" => "\x{e6}b\x{4d9}k\x{4d9}s " }; æbәlәuni әbændәn </pre> <p>And the warning message:</p> <pre> Use of uninitialized value in print at C:\hash_test.pl line 13, line 3. </pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload