StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POEfficient data structure/algorithm for transliteration based word lookup
primarykey
Id
7537662
data
AcceptedAnswerId
7541274
AnswerCount
2
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2011-09-24T07:20:33.050
FavoriteCount
5
LastActivityDate
2011-09-24T19:59:13.710
LastEditDate
2011-09-24T19:48:21.460
LastEditorUserId
761555
OwnerUserId
761555
ParentId
0
PostTypeId
1
Score
6
ViewCount
1173
LastEditorDisplayName
text
Body
I'm looking for a efficient data structure/algorithm for storing and searching transliteration based word lookup (like google do: <a href="http://www.google.com/transliterate/" rel="nofollow">http://www.google.com/transliterate/</a> but I'm not trying to use google transliteration API). Unfortunately, the natural language I'm trying to work on doesn't have any soundex implemented, so I'm on my own. For an open source project currently I'm using plain arrays for storing word list and dynamically generating regular expression (based on user input) to match them. It works fine, but regular expression is too powerful or resource intensive than I need. For example, I'm afraid this solution will drain too much battery if I try to port it to handheld devices, as searching over thousands of words with regular expression is too much costly. There must be a better way to accomplish this for complex languages, how does Pinyin input method work for example? Any suggestion on where to start? Thanks in advance. <hr> Edit: If I understand correctly, this is suggested by @Dialecticus- I want to transliterate from Language1, which has 3 characters <code>a,b,c</code> to Language2, which has 6 characters <code>p,q,r,x,y,z</code>. As a result of difference in numbers of characters each language possess and their phones, it is not often possible to define one-to-one mapping. Lets assume phonetically here is our associative arrays/transliteration table: <pre><code>a -> p, q b -> r c -> x, y, z </code></pre> We also have a valid word lists in plain arrays for Language2: <pre><code>... px qy ... </code></pre> If the user types <code>ac</code>, the possible combinations become <code>px, py, pz, qx, qy, qz</code> after transliteration step 1. In step 2 we have to do another search in valid word list and will have to eliminate everyone of them except <code>px</code> and <code>qy</code>. <hr> What I'm doing currently is not that different from the above approach. Instead of making possible combinations using the transliteration table, I'm building a regular expression <code>[pq][xyz]</code> and matching that with my valid word list, which provides the output <code>px</code> and <code>qy</code>. I'm eager to know if there is any better method than that.
Tags
<algorithm><data-structures><transliteration>
Title
Efficient data structure/algorithm for transliteration based word lookup
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USMehdi
UserOwnerUserId
1. USMehdi
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POEfficient data structure/algorithm for transliteration based word lookup
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POEfficient data structure/algorithm for transliteration based word lookup
 UserUserId
 USRifat
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POEfficient data structure/algorithm for transliteration based word lookup
 UserUserId
 USMehdi
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. COJust to be clear, will the end result of transliteration always have to be made up of words from the list of valid words?
 singulars
 PostPostId
 POEfficient data structure/algorithm for transliteration based word lookup
 UserUserId
 USMAK
2. CO@MAK, Preferably yes. There is no point suggesting a word that doesn't make sense.
 singulars
 PostPostId
 POEfficient data structure/algorithm for transliteration based word lookup
 UserUserId
 USMehdi

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.