StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
15980690
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2013-04-12T20:56:17.990
FavoriteCount
0
LastActivityDate
2013-04-12T23:41:44.240
LastEditDate
2013-04-12T23:41:44.240
LastEditorUserId
1633117
OwnerUserId
1633117
ParentId
15979519
PostTypeId
2
Score
8
ViewCount
0
LastEditorDisplayName
text
Body
First of all, note that there are no functions in Lua's <code>string</code> library that know anything about Unicode/mutlibyte encodings (source: Programming in Lua, 3rd edition). As far as Lua is concerned, strings are simply made up of bytes. It's up to you to figure out which bytes make up a character, if you are using UTF-8 encoded strings. Therefore, <code>string.len</code> will give you the number of bytes, not the number of characters. And <code>string.sub</code> will give you a substring of bytes not a substring of characters. Some UTF-8 basics: If you need some refreshing on the conceptual basics of Unicode, you should check out <a href="http://www.joelonsoftware.com/articles/Unicode.html">this article</a>. UTF-8 is one possible (and very important) implementation of Unicode - and probably the one you are dealing with. As opposed to UTF-32 and UTF-16 it uses a variable number of bytes (from 1 to 4) to encode each character. In particular, the ASCII characters 0 to 127 are represented with a single byte, so that ASCII strings can be correctly interpreted using UTF-8 (and vice versa, if you only use those 128 characters). All other characters start with a byte in the range from 194 to 244 (which signals that more bytes follow to encode a full character). This range is further subdivided, so that you can tell from this byte, whether 1, 2 or 3 more bytes follow. Those additional bytes are called continuation bytes and are guaranteed to be only taken from the range from 128 to 191. Therefore, by looking at a single byte we know where it stands in a character: <ul> <li>If it's in <code>[0,127]</code>, it's a single-byte (ASCII) character</li> <li>If it's in <code>[128,191]</code>, it's part of a longer character and meaningless on its own</li> <li>If it's in <code>[191,244]</code>, it marks the beginning of a longer character (and tells us how long that character is)</li> </ul> This information is enough to count characters, split a UTF-8 string into characters and do all sorts of other UTF-8-sensitive manipulations. Some pattern matching basics: For the task at hand we need a few of Lua's pattern matching constructs: <code>[...]</code> is a character class, that matches a single character (or rather byte) of those inside the class. E.g. <code>[abc]</code> matches either <code>a</code>, or <code>b</code> or <code>c</code>. You can define ranges using a hyphen. Therefore <code>[\33-\127]</code> for example, matches any single one of the bytes from <code>33</code> to <code>127</code>. Note that <code>\127</code> is an escape sequence you can use in any Lua string (not just patterns) to specify a byte by its numerical value instead of the corresponding ASCII character. For instance, <code>"a"</code> is the same as <code>"\97"</code>. You can negate a character class, by starting it with <code>^</code> (so that it matches any single byte that is not part of the class. <code>*</code> repeats the previous token 0 or more times (arbitrarily many times - as often as possible). <code>$</code> is an anchor. If it's the last character of the pattern, the pattern will only match at the end of the string. Combining all of that... ...your problem reduces to a one-liner: <pre><code>local function lastChar(s) return string.match(s, "[^\128-\191][\128-\191]*$") end </code></pre> This will match a character that is not a UTF-8 continuation character (i.e., that is either single-byte character, or a byte that marks the beginning of a longer character). Then it matches an arbitrary number of continuation characters (this cannot go past the current character, due to the range chosen), followed by the end of the string (<code>$</code>). Therefore, this will give you all the bytes that make up the last character in the string. It produces the desired output for all 4 of your examples. Equivalently, you can use <code>gsub</code> to remove that last character from your string: <pre><code>function deleteLastCharacter(s) return string.gsub(s, "[^\128-\191][\128-\191]*$", "") end </code></pre> The match is the same, but instead of returning the matched substring, we replace it with <code>""</code> (i.e. remove it) and return the modified string.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. PODetect if last character is not multibyte in Lua
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USMartin Ender
UserOwnerUserId
1. USMartin Ender
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. PODetect if last character is not multibyte in Lua
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.