Note that there are some explanatory texts on larger screens.

plurals
  1. POEfficient data structure/algorithm for transliteration based word lookup
    primarykey
    data
    text
    <p>I'm looking for a efficient data structure/algorithm for storing and searching transliteration based word lookup (like google do: <a href="http://www.google.com/transliterate/" rel="nofollow">http://www.google.com/transliterate/</a> but I'm not trying to use google transliteration API). Unfortunately, the natural language I'm trying to work on doesn't have any soundex implemented, so I'm on my own.</p> <p>For an open source project currently I'm using plain arrays for storing word list and dynamically generating regular expression (based on user input) to match them. It works fine, but regular expression is too powerful or resource intensive than I need. For example, I'm afraid this solution will drain too much battery if I try to port it to handheld devices, as searching over thousands of words with regular expression is too much costly.</p> <p>There must be a better way to accomplish this for complex languages, how does Pinyin input method work for example? Any suggestion on where to start?</p> <p>Thanks in advance.</p> <hr> <p>Edit: If I understand correctly, this is suggested by @Dialecticus-</p> <p>I want to transliterate from <strong>Language1</strong>, which has 3 characters <code>a,b,c</code> to <strong>Language2</strong>, which has 6 characters <code>p,q,r,x,y,z</code>. As a result of difference in numbers of characters each language possess and their phones, it is not often possible to define one-to-one mapping.</p> <p>Lets assume phonetically here is our associative arrays/transliteration table:</p> <pre><code>a -&gt; p, q b -&gt; r c -&gt; x, y, z </code></pre> <p>We also have a valid word lists in plain arrays for <strong>Language2</strong>:</p> <pre><code>... px qy ... </code></pre> <p>If the user types <code>ac</code>, the possible combinations become <code>px, py, pz, qx, qy, qz</code> after transliteration step 1. In step 2 we have to do another search in valid word list and will have to eliminate everyone of them except <code>px</code> and <code>qy</code>.</p> <hr> <p>What I'm doing currently is not that different from the above approach. Instead of making possible combinations using the transliteration table, I'm building a regular expression <code>[pq][xyz]</code> and matching that with my valid word list, which provides the output <code>px</code> and <code>qy</code>.</p> <p>I'm eager to know if there is any better method than that.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload