Note that there are some explanatory texts on larger screens.

plurals
  1. POPerl Japanese to English filename replacement
    primarykey
    data
    text
    <p>I put together a perl script that works to replace Japanese file names to English file names. But there are still a couple of things that I don’t quite understand well.</p> <p>I have the following configuration <em>Client OS:</em> </p> <p>Windows XP Japan</p> <p>Notepad++, installed</p> <p><em>Server:</em></p> <p>Red Hat Enterprise Linux Server release 6.2</p> <p>Perl v5.10.1</p> <p>VIM : VIM version 7.2.411</p> <p>Xterm : ASTEC-X version 6.0</p> <p>CSH: tcsh 6.17.00 (Astron)</p> <p>The source of the files are Japanese .csv files generated on Windows. I saw posts about using utf8 and encoding conversion in Perl, and I hope to understand better why I didn’t need anything mentioned in the other threads. </p> <p>Here is my script that worked? My questions are below.</p> <pre><code>#!/usr/bin/perl my $work_dir = "/nas1_home4/fsomeguy/someplace"; opendir(DIR, $work_dir) or die "Cannot open directory"; my @files = readdir(DIR); foreach (@files) { my $original_file = $_; s/機/–machine_/; # replace 機 with -machine_ my $new_file = $_; if ($new_file ne $original_file) { print "Rename " . $original_file . " to " . $new_file; rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or print "Warning: rename failed because: $!\n"; } } </code></pre> <p><strong>Questions:</strong></p> <p>1) Why isn’t utf8 required in this sample? In what type of examples would I need it. Use uft8; was discussed: <a href="https://stackoverflow.com/questions/15210532/use-utf8-gives-me-wide-character-in-print">use utf8 gives me &#39;Wide character in print&#39;</a>)? But if I have added use utf8, then this script won’t work.</p> <p>2) Why isn’t encoding manipulation required in this sample?<br> I actually wrote the script in Windows using Notepad++ (pasting in the Japanese characters from Windows XP Japan’s Explorer to my script). In Xterm, and VIM, the characters show up as garbage characters. But I didn’t have to deal with Encoding manipulation either, which was discussed here <a href="https://stackoverflow.com/questions/2855707/how-can-i-convert-japanese-characters-to-unicode-in-perl">How can I convert japanese characters to unicode in Perl?</a> .</p> <p>Thanks.</p> <p><strong><em>UPDATES 1</em></strong></p> <p><em>Testing a simple localization sample in Perl for filename and file text replacement in Japanese</em></p> <p>In Windows XP, copy the 南 character from within a .csv data file and copy to the clipboard, then use it as both the file name (ie. 南.txt) and file content (南). In Notepad++ , reading the file under encoding UTF-8 shows x93xEC, reading it under SHIFT_JIS displays南.</p> <p><strong>Script:</strong></p> <p>Use the following Perl script south.pl, which will be run on a Linux server with Perl 5.10</p> <pre><code>#!/usr/bin/perl use feature qw(say); use strict; use warnings; use utf8; use Encode qw(decode encode); my $user_dir="/usr/frank"; my $work_dir = "${user_dir}/test_south"; # forward declare the function prototypes sub fileProcess; opendir(DIR, ${work_dir}) or die "Cannot open directory " . ${work_dir}; # readdir OPTION 1 - shift_jis #my @files = map { Encode::decode("shift_jis", $_); } readdir DIR; # Note filename could not be decoded as shift_jis #binmode(STDOUT,":encoding(shift_jis)"); # readdir OPTION 2 - utf8 my @files = map { Encode::decode("utf8", $_); } readdir DIR; # Note filename could be decoded as utf8 binmode(STDOUT,":encoding(utf8)"); # setting display to output utf8 say @files; # pass an array reference of files that will be modified fileNameTranslate(); fileProcess(); closedir(DIR); exit; sub fileNameTranslate { foreach (@files) { my $original_file = $_; #print "original_file: " . "$original_file" . "\n"; s/南/south/; my $new_file = $_; # print "new_file: " . "$_" . "\n"; if ($new_file ne $original_file) { print "Rename " . $original_file . " to \n\t" . $new_file . "\n"; rename("${work_dir}/${original_file}", "${work_dir}/${new_file}") or print "Warning: rename failed because: $!\n"; } } } sub fileProcess { # file process OPTION 3, open file as shift_jis, the search and replace would work # open (IN1, "&lt;:encoding(shift_jis)", "${work_dir}/south.txt") or die "Error: south.txt\n"; # open (OUT1, "+&gt;:encoding(shift_jis)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n"; # file process OPTION 4, open file as utf8, the search and replace would not work open (IN1, "&lt;:encoding(utf8)", "${work_dir}/south.txt") or die "Error: south.txt\n"; open (OUT1, "+&gt;:encoding(utf8)" , "${work_dir}/south1.txt") or die "Error: south1.txt\n"; while (&lt;IN1&gt;) { print $_ . "\n"; chomp; s/南/south/g; print OUT1 "$_\n"; } close IN1; close OUT1; } </code></pre> <p><strong>Result:</strong></p> <p>(BAD) Uncomment Option 1 and 3, (Comment Option 2 and 4) Setup: Readdir encoding, SHIFT_JIS; file open encoding SHIFT_JIS Result: file name replacement failed.. Error: utf8 "\x93" does not map to Unicode at .//south.pl line 68. \x93</p> <p>(BAD) Uncomment Option 2 and 4 (Comment Option 1 and 3) Setup: Readdir encoding, utf8; file open encoding utf8 Result: file name replacement worked, south.txt generated But south1.txt file content replacement failed , it has the content \x93 (). Error: "\x{fffd}" does not map to shiftjis at .//south.pl line 25. ... -Ao?= (Bx{fffd}.txt</p> <p>(GOOD) Uncomment Option 2 and 3, (Comment Option 1 and 4) Setup: Readdir encoding, utf8; file open encoding SHIFT_JIS Result: file name replacement worked, south.txt generated South1.txt file content replacement worked, it has the content south.</p> <p><strong>Conclusion:</strong> </p> <p>I had to use different encoding scheme for this example to work properly. Readdir utf8, and file processing SHIFT_JIS since the content of the csv file was SHIFT_JIS encoded. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload