StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
19596309
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
9
CommunityOwnedDate
CreationDate
2013-10-25T17:45:29.623
FavoriteCount
0
LastActivityDate
2013-10-28T20:54:42.173
LastEditDate
2013-10-28T20:54:42.173
LastEditorUserId
477572
OwnerUserId
477572
ParentId
19594864
PostTypeId
2
Score
3
ViewCount
0
LastEditorDisplayName
text
Body
In UTF-8 all ASCII characters under <code>127</code> are represented by one byte (binary representation of <code>0xxxxxxx</code>) and code points larger than <code>127</code> are represented by multi-byte sequences. Multi-byte sequences are composed of a leading byte and one or more continuation bytes. The leading byte's high order bits serve to tell us how many continuation bytes to use and for that purpose it has two or more high-order 1s followed by a 0, i.e. the high bits can be <code>110</code> or <code>1110</code> or <code>11110</code> or <code>111110</code>. The number of the high-order bits are equal to the sum of the leading byte plus the continuation bytes, i.e. <pre><code>110 means 1 leading byte + 1 continuation byte 1110 means 1 leading byte + 2 continuation bytes 11110 means 1 leading byte + 3 continuation bytes </code></pre> Continuation bytes which follow a leading byte have the format <code>10xxxxxx</code>. Applying the above to your <code>$test</code> string: We have three bytes <code>ord('X')</code> that all are ascii chars under <code>127</code>, so those are counted as 1 char to 1 byte, Then we have a <code>chr(241)</code> with binary representation of 11110001 so it's a leading byte since it has two or more high-bits. Since it has 4 high bits that means that the code point it represents consists of 1 leading byte plus 3 continuation bytes, so the 3 <code>ord('X')</code> bytes that remain in the string are considered by <code>mb_strlen()</code> as continuation bytes* and although together with the chr(241) are a total of four bytes they are counted as one UTF-8 code point. *Here we must state that those trailing 'X's are not valid continuation bytes since they do not conform to the standard of a continuation byte. However <code>mb_strlen()</code> will consume as explained above up to 3 more bytes after the <code>chr(241)</code>. You can test this if you add another <code>'X</code>' or you subtract <code>'X's</code> from the end of the <code>$test</code> string. UPDATE: Verifying the findings: <pre><code>/* * The following strings are non valid UTF-8 encodings. * We test to see if mb_strlen() consumes non VALID UTF-8 * byte strings like they are valid (driven by the leading bytes) * */ /* * 0xc0 as a leading byte should consume one continuation byte * so the length reported should be 6 */ $test = 'XXX' . chr(0xc0) . 'XXX'; echo '6 == ', mb_strlen($test, 'UTF8'); /* * 0xe0 as a leading byte should consume two continuation bytes * so the length reported should be 5 */ $test = 'XXX' . chr(0xe0) . 'XXX'; echo '5 == ', mb_strlen($test, 'UTF8'), PHP_EOL; // results in 6 == 6 and 5 == 5 </code></pre> UPDATE 2: An example of constructing with <code>chr()</code> the same symbol in Latin-1 and UTF-8. <pre><code>$euroSignAscii = chr(0x80); // Latin-1 extended ASCII $euroSignUtf8 = chr(0xe2) . chr(0x82) . chr(0xac); // UTF-8 </code></pre> Take note if you echo the above strings the encoding of your console or web page (if it is latin-1 then the <code>$euroSignAscii</code> will output correctly, if it is UTF-8 then the <code>$euroSignUtf8</code> will output correctly). <hr> Links: A good reference is the relevant <a href="http://en.wikipedia.org/wiki/UTF-8" rel="nofollow">UTF-8 article on Wikipedia</a> A classic post from Joel Spolsky <a href="http://www.joelonsoftware.com/articles/Unicode.html" rel="nofollow">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a> And to get the feel <a href="http://www.utf8-chartable.de/unicode-utf8-table.pl" rel="nofollow">UTF-8 encoding table and Unicode characters</a>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POPHP mb_strlen with string append chr(241)
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USIoannis Lalopoulos
UserOwnerUserId
1. USIoannis Lalopoulos
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POPHP mb_strlen with string append chr(241)
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.