StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
4611560
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2011-01-06T03:49:23.613
FavoriteCount
0
LastActivityDate
2011-01-06T04:21:34.573
LastEditDate
2011-01-06T04:21:34.573
LastEditorUserId
139937
OwnerUserId
139937
ParentId
4611425
PostTypeId
2
Score
1
ViewCount
0
LastEditorDisplayName
text
Body
I may be able to offer some insight, but it's hard to tell if my answer will be "helpful". First, I only speak and read english, so I obviously do not speak or read chinese. I do happen to be the author of <a href="http://regexkit.sourceforge.net/RegexKitLite/index.html" rel="nofollow">RegexKitLite</a>, which is an Objective-C wrapper around the ICU regex engine. This is obviously not <code>perl</code>, :). Despite this, the ICU regex engine happens to have a feature that sounds remarkably like what it is that you're trying to do. Specifically, the ICU regex engine contains the <code>UREGEX_UWORD</code> modifier option, which can be turned on dynamically via the normal <code>(?w:...)</code> syntax. This modifier performs the following action: <blockquote> Controls the behavior of \b in a pattern. If set, word boundaries are found according to the definitions of word found in Unicode UAX 29, Text Boundaries. By default, word boundaries are identified by means of a simple classification of characters as either “word” or “non-word”, which approximates traditional regular expression behavior. The results obtained with the two options can be quite different in runs of spaces and other non-word characters. </blockquote> You can use this in a regex like <code>(?w:\b(.*?)\b)</code> to "extract" words from a string. In the ICU regex engine, it has a fairly powerful "word breaking engine" that is specifically designed to find word breaks in written languages that do not have an explicit space 'character', like english. Again, not reading or writing these languages, my understanding is that "itisroughlysomethinglikethis". The ICU word breaking engine uses heuristics, and occasionally dictionaries, to be able to find the word breaks. It is my understanding that Thai happens to be a particularly difficult case. In fact, I happen to use <code>ฉันกินข้าว</code> (Thai for "I eat rice", or so I was told) with a regex of <code>(?w)\b\s*</code> to perform a <code>split</code> operation on the string to extract the words. Without <code>(?w)</code> you can not split on word breaks. With <code>(?w)</code> it results in the words <code>ฉัน</code>, <code>กิน</code>, and <code>ข้าว</code>. Provided the above "sounds like the problem you're having", then this could be the reason. If this is the case, then I am not aware of any way to accomplish this in <code>perl</code>, but I wouldn't consider this opinion an authoritative answer since I use the ICU regex engine more often than the <code>perl</code> one and am clearly not properly motivated to find a working <code>perl</code> solution when I've already got one :). Hope this helps.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow to count the Chinese word in a file using regex in perl?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USjohne
UserOwnerUserId
1. USjohne
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTDownMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTDownMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.