Note that there are some explanatory texts on larger screens.

plurals
  1. POEfficient algorithm for converting a character set into a nfa/dfa
    primarykey
    data
    text
    <p>I'm currently working on a scanner generator. The generator already works fine. But when using character classes the algorithm gets very slow.</p> <p>The scanner generator produces a scanner for UTF8 encoded files. The full range of characters (0x000000 to 0x10ffff) should be supported.</p> <p>If I use large character sets, like the any operator '.' or the unicode property {L}, the nfa (and also the dfa) contains a lot of states ( > 10000 ). So the convertation for nfa to dfa and create the minimal dfa takes a long time (even if the output minimal dfa contains only a few states). </p> <p>Here's my current implementation of creating a character set part of the nfa.</p> <pre><code>void CreateNfaPart(int startStateIndex, int endStateIndex, Set&lt;int&gt; characters) { transitions[startStateIndex] = CreateEmptyTransitionsArray(); foreach (int character in characters) { // get the utf8 encoded bytes for the character byte[] encoded = EncodingHelper.EncodeCharacter(character); int tStartStateIndex = startStateIndex; for (int i = 0; i &lt; encoded.Length - 1; i++) { int tEndStateIndex = transitions[tStartStateIndex][encoded[i]]; if (tEndStateIndex == -1) { tEndStateIndex = CreateState(); transitions[tEndStateIndex] = CreateEmptyTransitionsArray(); } transitions[tStartStateIndex][encoded[i]] = tEndStateIndex; tStartStateIndex = tEndStateIndex; } transitions[tStartStateIndex][encoded[encoded.Length - 1]] = endStateIndex; } </code></pre> <p>Does anyone know how to implement the function much more efficiently to create only the necessary states?</p> <p>EDIT:</p> <p>To be more specific I need a function like:</p> <pre><code>List&lt;Set&lt;byte&gt;[]&gt; Convert(Set&lt;int&gt; characters) { ??????? } </code></pre> <p>A helper function to convert a character (int) to a UTF8 encoding byte[] is defined as:</p> <pre><code>byte[] EncodeCharacter(int character) { ... } </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload