Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to make Mason2 UTF-8 clean?
    primarykey
    data
    text
    <p>Reformulating the question, because</p> <ul> <li>@optional <a href="https://stackoverflow.com/questions/5858596/how-to-make-mason2-utf-8-clean#comment48774392_6807411">asked me</a></li> <li>it wasn't clear and linked one <a href="https://metacpan.org/release/HTML-Mason" rel="nofollow noreferrer">HTML::Mason</a> based solution <a href="http://www.cybaea.net/Blogs/TechNotes/Mason-utf-8-clean.html" rel="nofollow noreferrer">Four easy steps to make Mason UTF-8 Unicode clean with Apache, mod_perl, and DBI </a>, what caused confusions</li> <li>the original is 4 years old and meantime (in 2012) the "poet" is created </li> </ul> <p><em>Comment: This question already earned the "popular question badge", so probably i'm not the only hopeless person. :)</em></p> <p>Unfortunately, demonstrating the <em>full</em> problem stack leads to an very long question and it is very <a href="https://metacpan.org/pod/Mason" rel="nofollow noreferrer">Mason</a> specific.</p> <p>First, the opinions-only part :)</p> <p>I'm using HTML::Mason over ages, and now trying to use Mason2. The <a href="https://metacpan.org/release/Poet" rel="nofollow noreferrer">Poet</a> and <a href="https://metacpan.org/pod/Mason" rel="nofollow noreferrer">Mason</a> are the most advanced frameworks in the CPAN. Found nothing comparamble, what out-of-box allows write so clean /but very hackable :)/ web-apps, with many batteries included (logging, cacheing, config-management, native PGSI based, etc...)</p> <p>Unfortunately, the author doesn't care about the rest of the word, e.g. by default, it is only ascii based, <em>without <strong>any</strong> manual, faq or advices about: <strong>how</strong> to use it with unicode</em></p> <p>Now the facts. Demo. Create an poet app:</p> <pre><code>poet new my #the "my" directory is the $poet_root mkdir -p my/comps/xls cd my/comps/xls </code></pre> <p>and add into the <code>dhandler.mc</code> the following (what will demostrating the two basic problems)</p> <pre><code>&lt;%class&gt; has 'dwl'; use Excel::Writer::XLSX; &lt;/%class&gt; &lt;%init&gt; my $file = $m-&gt;path_info; $file =~ s/[^\w\.]//g; my $cell = lc join ' ', "ÅNGSTRÖM", "in the", $file; if( $.dwl ) { #create xlsx in the memory my $excel; open my $fh, '&gt;', \$excel or die "Failed open scalar: $!"; my $workbook = Excel::Writer::XLSX-&gt;new( $excel ); my $worksheet = $workbook-&gt;add_worksheet(); $worksheet-&gt;write(0, 0, $cell); $workbook-&gt;close(); #poet/mason output $m-&gt;clear_buffer; $m-&gt;res-&gt;content_type("application/vnd.ms-excel"); $m-&gt;print($excel); $m-&gt;abort(); } &lt;/%init&gt; &lt;table border=1&gt; &lt;tr&gt;&lt;td&gt;&lt;% $cell %&gt;&lt;/td&gt;&lt;/tr&gt; &lt;/table&gt; &lt;a href="?dwl=yes"&gt;download &lt;% $file %&gt;&lt;/a&gt; </code></pre> <p>and run the app</p> <pre><code>../bin/run.pl </code></pre> <p>go to <a href="http://0:5000/xls/hello.xlsx" rel="nofollow noreferrer">http://0:5000/xls/hello.xlsx</a> and you will get:</p> <pre><code>+----------------------------+ | ÅngstrÖm in the hello.xlsx | +----------------------------+ download hello.xlsx </code></pre> <p>Clicking the <a href="http://0.0.0.0:5000/xls/hello.xlsx?dwl=yes" rel="nofollow noreferrer">download hello.xlsx</a>, you will get <code>hello.xlsx</code> in the downloads.</p> <p>The above demostrating the first problem, e.g. the component's source arent "under" the <code>use utf8;</code>, so the <code>lc</code> doesn't understand characters.</p> <p>The second problem is the following, try the [<a href="http://0:5000/xls/h%C3%A9ll%C3%B3.xlsx]" rel="nofollow noreferrer">http://0:5000/xls/hélló.xlsx]</a> , or <a href="http://0:5000/xls/h%C3%A9ll%C3%B3.xlsx" rel="nofollow noreferrer">http://0:5000/xls/h%C3%A9ll%C3%B3.xlsx</a> and you will see:</p> <pre><code>+--------------------------+ | ÅngstrÖm in the hll.xlsx | +--------------------------+ download hll.xlsx #note the wrong filename </code></pre> <p>Of course, the input (the <code>path_info</code>) isn't decoded, the script works with the utf8 encoded octets and not with perl characters.</p> <p>So, telling perl - "the source is in utf8", by adding the <code>use utf8;</code> into the <code>&lt;%class%&gt;</code>, results</p> <pre><code>+--------------------------+ | �ngstr�m in the hll.xlsx | +--------------------------+ download hll.xlsx </code></pre> <p>adding <code>use feature 'unicode_strings'</code> (or <code>use 5.014;</code>) even worse:</p> <pre><code>+----------------------------+ | �ngstr�m in the h�ll�.xlsx | +----------------------------+ download h�ll�.xlsx </code></pre> <p><strong>Of course</strong>, the source now contains wide characters, it needs <code>Encode::encode_utf8</code> at the output.</p> <p>One could try use an filter such:</p> <pre><code>&lt;%filter uencode&gt;&lt;% Encode::encode_utf8($yield-&gt;()) %&gt;&lt;/%filter&gt; </code></pre> <p>and filter the whole output:</p> <pre><code>% $.uencode {{ &lt;table border=1&gt; &lt;tr&gt;&lt;td&gt;&lt;% $cell %&gt;&lt;/td&gt;&lt;/tr&gt; &lt;/table&gt; &lt;a href="?dwl=yes"&gt;download &lt;% $file %&gt;&lt;/a&gt; % }} </code></pre> <p>but this helps only partially, because need care about the encoding in the <code>&lt;%init%&gt;</code> or <code>&lt;%perl%&gt;</code> blocks. Encoding/decoding <strong>inside</strong> of the perl code at many places, (<em>read: not at the borders</em>) leads to an spagethy code.</p> <p>The encoding/decoding should be clearly done <strong>somewhere</strong> at the <a href="https://metacpan.org/release/Poet" rel="nofollow noreferrer">Poet</a>/<a href="https://metacpan.org/pod/Mason" rel="nofollow noreferrer">Mason</a> borders - of course, the Plack operates on the byte level.</p> <hr> <p>Partial solution.</p> <p>Happyly, the <a href="https://metacpan.org/release/Poet" rel="nofollow noreferrer">Poet</a> cleverly allows modify it's (and Mason's) parts, so, in the <code>$poet_root/lib/My/Mason</code> you could modify the <code>Compilation.pm</code> to:</p> <pre><code>override 'output_class_header' =&gt; sub { return join("\n", super(), qq( use 5.014; use utf8; use Encode; ) ); }; </code></pre> <p>what will insert the wanted preamble into <strong>every</strong> Mason component. (Don't forget touch every component, or simply remove the compiled objects from the <code>$poet_root/data/obj</code>).</p> <p>Also you could <strong>try</strong> handle the request/responses at the borders, by editing the <code>$poet_root/lib/My/Mason/Request.pm</code> to:</p> <pre><code>#found this code somewhere on the net use Encode; override 'run' =&gt; sub { my($self, $path, $args) = @_; #decode values - but still missing the "keys" decode foreach my $k (keys %$args) { $args-&gt;set($k, decode_utf8($args-&gt;get($k))); } my $result = super(); #encode the output - BUT THIS BREAKS the inline XLS $result-&gt;output( encode_utf8($result-&gt;output()) ); return $result; }; </code></pre> <p>Encode everything is an wrong strategy, it <strong>breaks</strong> e.g. the XLS.</p> <p>So, 4 years after (i asked the original question in 2011) <em>still</em> don't know :( how to use correctly the unicode in the <a href="https://metacpan.org/pod/Mason" rel="nofollow noreferrer">Mason2</a> applications and still doesn't exists any documentation or helpers about it. :(</p> <p>The main questions are: - where (what methods should be modified by Moose's method modifiers) and how correctly decode the inputs and where the output (in the Poet/Mason app.)</p> <ul> <li>but only textual ones, e.g. <code>text/plain</code> or <code>text/html</code> and such...</li> <li>a do the above "surprise free" - e.g. what will simply works. ;)</li> </ul> <p>Could someone please help with real code - what i should modify in the above?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload