Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>There are two basic solutions to this kind of problem.</p> <ol> <li>Define the action so it can be executed safely multiple times,</li> <li>Change the syntax so the action is only executed once.</li> </ol> <p>In this case, I would choose a hybrid approach. Use actions to record the start and ending positions of a <code>name</code>: these actions can be executed safely many times since they just record locations. Once you are sure you are past the name, execute a different action that will only execute once.</p> <pre><code>/* C code */ char *name_start, *name_end; /* Ragel code */ action markNameStart { name_start = p; } action markNameEnd { name_end = p; } action nameAction { /* Clumsy since name is not nul-terminated */ fputs("Name = ", stdout); fwrite(name_start, 1, name_end - name_start, stdout); fputc('\n', stdout); } name = space* %markNameStart (alnum+ %markNameEnd &lt;: space*)+ %nameAction ; main := name ":" name ; </code></pre> <p>Here, the syntax for <code>name</code> includes arbitrary spaces and at least one alphanumeric character. When the first alphanumeric character is encountered, its location is saved in <code>name_start</code>. Whenever run of alphanumeric characters ends, the location of the following character is saved in <code>name_end</code>. The <code>&lt;:</code> is technically unnecessary but it reduces how often the <code>markNameEnd</code> action is executed.</p> <p>Just be sure not to place such an expression next to any spaces.</p> <p><strong>I have not tested the above code.</strong> You should look at the Graphviz visualization of the state machine before you use.</p> <h2>What Ragel is doing</h2> <p>With your original code, let's suppose the input is as follows:</p> <pre> Hello world : Goodbye world </pre> <p>The Ragel machine scans from left to right, finds the start of a <code>name</code>, and scans over the alphanumeric characters.</p> <pre> Hello world : Goodbye world ↑ </pre> <p>The next character is a space. So either we have encountered a space inside a word, or the first space after the end of a word. How does Ragel choose?</p> <p><strong>Ragel chooses both options, at the same time.</strong> This is very important. Ragel is trying to simulate a nondeterministic finite automaton, but since your computer is deterministic, the easiest way to do that is to convert the NFA to a DFA which simulates an unlimited number of NFAs in parallel. Since the NFAs have a finite number of states (hence the name), the DFAs also have a finite number of states, so this technique works.</p> <p>After encountering the space, you have one NFA in the following state, looking for the rest of the <code>name</code>:</p> <pre> identifier = alnum (space* alnum)*; ↑ main := name sep name; ↑ </pre> <p>The second NFA is in the following state, and it assumes that the <code>name</code> has already ended (and this NFA executes the <code>fName</code> action "prematurely"):</p> <pre> sep = space* ":" space*; ↑ main := name sep name; ↑ </pre> <p>It's obvious to you and it's obvious to me that only the first NFA is correct. But machines created with Ragel only look at one character at a time, they don't look ahead to see which option is correct. The second NFA will eventually encounter an alphanumeric character where it was expecting to see <code>":"</code>, and since that's not allowed, the second NFA will disappear.</p> <h2>A look at the Ragel documentation</h2> <p>Here's the description of <code>%</code>:</p> <blockquote> <pre><code>expr % action </code></pre> <p>The leaving action operator queues an action for embedding into the transitions that go out of a machine via a final state.</p> </blockquote> <p>The action gets executed for transitions that do not necessarily contribute to a successful parse. See the Ragel guide, Chapter 4, "Controlling Nondeterminism" for more information about nondeterminism in Ragel, although the techniques in Chapter 4 won't help you in this particular case since the actions in your machine can only be disambiguated with unbound lookahead, which is not allowed in finite state machines.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload