StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POperl save utf-8 text problem
text
Body
copied!<p>I am playing around the pplog, a single file file base blog.</p> <p>The writing to file code:</p> <pre><code>open(FILE, ">$config_postsDatabaseFolder/$i.$config_dbFilesExtension"); my $date = getdate($config_gmt); print FILE $title.'"'.$content.'"'.$date.'"'.$category.'"'.$i; # 0: Title, 1: Content, 2: Date, 3: Category, 4: FileName print 'Your post '. $title.' has been saved. <a href="?page=1">Go to Index</a>'; close FILE; </code></pre> <p>The input text:</p> <pre><code>春眠不覺曉，處處聞啼鳥．夜來風雨聲，花落知多小． </code></pre> <p>After store to file, it becomes:</p> <pre><code>春眠不覺�›�，處處聞啼鳥．夜來風�›�聲，花落知多小． </code></pre> <p>I can use Eclipse to edit the file and make it render to normal. The problem exists during printing to the file. </p> <p>Some basic info: Strawberry perl 5.12 without use utf8; tried use utf8;, dosn't have effect.</p> <p>Thank you.</p> <p>--- EDIT --- Thanks for comments. I traced the code:</p> <p>Codes add new content:</p> <pre><code># Blog Add New Entry Page my $pass = r('pass'); #BK 7JUL09 patch from fedekun, fix post with no title that caused zero-byte message... my $title = r('title'); my $content = ''; if($config_useHtmlOnEntries == 0) { $content = bbcode(r('content')); } else { $content = basic_r('content'); } my $category = r('category'); my $isPage = r('isPage'); sub r { escapeHTML(param($_[0])); } </code></pre> <p>sub r forward the command to a CGI.pm function. </p> <p>In CGI.pm</p> <pre><code>sub escapeHTML { # hack to work around earlier hacks push @_,$_[0] if @_==1 && $_[0] eq 'CGI'; my ($self,$toencode,$newlinestoo) = CGI::self_or_default(@_); return undef unless defined($toencode); $toencode =~ s{&}{&amp;}gso; $toencode =~ s{<}{&lt;}gso; $toencode =~ s{>}{&gt;}gso; if ($DTD_PUBLIC_IDENTIFIER =~ /[^X]HTML 3\.2/i) { # $quot; was accidentally omitted from the HTML 3.2 DTD -- see # <http://validator.w3.org/docs/errors.html#bad-entity> / # <http://lists.w3.org/Archives/Public/www-html/1997Mar/0003.html>. $toencode =~ s{"}{&#34;}gso; } else { $toencode =~ s{"}{&quot;}gso; } # Handle bug in some browsers with Latin charsets if ($self->{'.charset'} && (uc($self->{'.charset'}) eq 'ISO-8859-1' # This line cause trouble. it treats Chinese chars as ISO-8859-1 || uc($self->{'.charset'}) eq 'WINDOWS-1252')) { $toencode =~ s{'}{&#39;}gso; $toencode =~ s{\x8b}{&#8249;}gso; $toencode =~ s{\x9b}{&#8250;}gso; if (defined $newlinestoo && $newlinestoo) { $toencode =~ s{\012}{&#10;}gso; $toencode =~ s{\015}{&#13;}gso; } } return $toencode; } </code></pre> <p>Further trace the problem, found out the browser default to iso-8859-1, even manually set to utf-8, it send the string back to server as iso-8859-1.</p> <p>Finally, </p> <pre><code>print header(-charset => qw(utf-8)), '<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> </code></pre> <p>add the -charset => qw(utf-8) param to header. The Chinese poem is still Chinese poem. </p> <p>Thanks for Schwern's comments, it inspired me to trace out the problem and learn a leeson.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload