StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>There is a Wiki page buried somewhere on Antlr.org that speaks to your question; cannot seem to find in just now. </p> <p>In substance, the lexer reads data using a standard InputStream interface, specifically ANTLRInputStream.java. The typical implementation is <a href="https://github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/v4/runtime/ANTLRFileStream.java" rel="nofollow">ANTLRFileStream.java</a> that preemptively reads the entire input data file into memory. What you need to do is to write your own buffered version -"ANTLRBufferedFileStream.java"- that reads from the source file as needed. Or, just set a standard BufferedInputStream/FileInputStream as the data source to the AntlrInputStream.</p> <p>One caveat is that Antlr4 has the potential for doing an unbounded lookahead. Not likely a problem for a reasonably sized buffer in normal operation. More likely when the parser attempts error recovery. Antlr4 allows for tailoring of the error recovery strategy, so the problem is manageable.</p> <p>Additional detail:</p> <p>In effect, Antlr implements a pull-parser. When you call the first parser rule, the parser requests tokens from the lexer, which requests character data from the input stream. The parser/lexer interface is implemented by a buffered token stream, nominally <a href="https://github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/v4/runtime/BufferedTokenStream.java" rel="nofollow">BufferedTokenStream</a>.</p> <p>The parse tree is little more than a tree data structure of tokens. Well, a lot more, but not in terms of data size. Each token is an INT value backed typically by a fragment of the input data stream that matched the token definition. The lexer itself does not require a full copy of the lex'd input character stream to be kept in memory. And, the token text fragments could be zero'd out. The critical memory requirement for the lexer is the input character stream lookahead scan, given a buffered file input stream.</p> <p>Depending on your needs, the in-memory parse tree can be small even given a 100GB+ input file.</p> <p>To help further, you need to explain more what it is you are trying to do in Antlr and what defines your minimum critical memory requirement. That will guide which additional strategies can be recommended. For example, if the source data is amenable, you can use multiple lexer/parser runs, each time subselecting in the lexer different portions of the source data to process. Compared to file reads and DB writes, even with fast disks, Antlr execution will likely be barely noticeable.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload