Note that there are some explanatory texts on larger screens.

plurals
  1. POHow do I find the length of a Unicode string in Perl?
    text
    copied!<p>The <code>perldoc</code> page for <a href="http://perldoc.perl.org/functions/length.html" rel="noreferrer">length()</a> tells me that I should use <code>bytes::length(EXPR)</code> to find a Unicode string in bytes, or and the <a href="http://perldoc.perl.org/bytes.html" rel="noreferrer">bytes</a> page echoes this.</p> <pre><code>use bytes; $ascii = 'Lorem ipsum dolor sit amet'; $unicode = 'Lørëm ípsüm dölör sît åmét'; print "ASCII: " . length($ascii) . "\n"; print "ASCII bytes: " . bytes::length($ascii) . "\n"; print "Unicode: " . length($unicode) . "\n"; print "Unicode bytes: " . bytes::length($unicode) . "\n"; </code></pre> <p>The output of this script, however, disagrees with the manpage:</p> <pre><code>ASCII: 26 ASCII bytes: 26 Unicode: 35 Unicode bytes: 35 </code></pre> <p>It seems to me length() and bytes::length() return the same for both ASCII &amp; Unicode strings. I have my editor set to write files as UTF-8 by default, so I figure Perl is interpreting the whole script as Unicode—does that mean length() automatically handles Unicode strings properly?</p> <p><strong>Edit:</strong> See my comment; my question doesn't make a whole lot of sense, because length() is <em>not</em> working "properly" in the above example - it is showing the length of the Unicode string in bytes, not characters. The reson I originally stumbled across this is for a program in which I need to set the Content-Lenth header (in bytes) in an HTTP message. I had read up on Unicode in Perl and was expecting to have to do some fanciness to make things work, but when length() returned exactly what I needed right of the bat, I was confused! See the accepted answer for an overview of <code>use utf8</code>, <code>use bytes</code>, and <code>no bytes</code> in Perl.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload