Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>From what I understand, you have an input string S in an alphabet (lets call it A1) and you want to convert it to the string S' which is its equivalent in another alphabet A2. Actually, if I understand correctly, you want to generate a list [S'1,S'2,...,S'n] of output strings which might potentially be equivalent to S.</p> <p>One approach that comes to mind is for each word in the list of valid words in A2 generate a list of strings in A1 that matches the. Using the example in your edit, we have </p> <pre><code>px-&gt;ac qy-&gt;ac pr-&gt;ab </code></pre> <p>(I have added an extra valid word <code>pr</code> for clarity)</p> <p>Now that we know what possible series of input symbols will always map to a valid word, we can use our table to build a <a href="http://en.wikipedia.org/wiki/Trie" rel="nofollow">Trie</a>.</p> <p>Each node will hold a pointer to a list of valid words in A2 that map to the sequence of symbols in A1 that form the path from the root of the Trie to the current node.</p> <p>Thus for our example, the Trie would look something like this </p> <pre><code> Root (empty) | a | V +---Node (empty)---+ | b | c | | V V Node (px,qy) Node (pr) </code></pre> <p>Starting at the root node, as symbols are consumed transitions are made from the current node to its child marked with the symbol consumed until we have read the entire string. If at any point no transition is defined for that symbol, the entered string does not exist in our trie and thus does not map to a valid word in our target language. Otherwise, at the end of the process, the list of words associated with the current node is the list of valid words the input string maps to.</p> <p>Apart from the initial cost of building the trie (the trie can be shipped pre-built if we never want the list of valid words to change), this takes O(n) on the length of the input to find a list of mapping valid words. </p> <p>Using a Trie also provide the advantage that you can also use it to find the list of all valid words that can be generated by adding more symbols to the end of the input - i.e. a prefix match. For example, if fed with the input symbol 'a', we can use the trie to find all valid words that can begin with 'a' ('px','qr','py'). But doing that is not as fast as finding the exact match.</p> <p>Here's a quick hack at a solution (in Java):</p> <pre><code>import java.util.*; class TrieNode{ // child nodes - size of array depends on your alphabet size, // her we are only using the lowercase English characters 'a'-'z' TrieNode[] next=new TrieNode[26]; List&lt;String&gt; words; public TrieNode(){ words=new ArrayList&lt;String&gt;(); } } class Trie{ private TrieNode root=null; public void addWord(String sourceLanguage, String targetLanguage){ root=add(root,sourceLanguage.toCharArray(),0,targetLanguage); } private static int convertToIndex(char c){ // you need to change this for your alphabet return (c-'a'); } private TrieNode add(TrieNode cur, char[] s, int pos, String targ){ if (cur==null){ cur=new TrieNode(); } if (s.length==pos){ cur.words.add(targ); } else{ cur.next[convertToIndex(s[pos])]=add(cur.next[convertToIndex(s[pos])],s,pos+1,targ); } return cur; } public List&lt;String&gt; findMatches(String text){ return find(root,text.toCharArray(),0); } private List&lt;String&gt; find(TrieNode cur, char[] s, int pos){ if (cur==null) return new ArrayList&lt;String&gt;(); else if (pos==s.length){ return cur.words; } else{ return find(cur.next[convertToIndex(s[pos])],s,pos+1); } } } class MyMiniTransliiterator{ public static void main(String args[]){ Trie t=new Trie(); t.addWord("ac","px"); t.addWord("ac","qy"); t.addWord("ab","pr"); System.out.println(t.findMatches("ac")); // prints [px,qy] System.out.println(t.findMatches("ab")); // prints [pr] System.out.println(t.findMatches("ba")); // prints empty list since this does not match anything } } </code></pre> <p>This is a very simple trie, no compression or speedups and only works on lower case English characters for the input language. But it can be easily modified for other character sets.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload