StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
text
Body
copied!<p>Assuming you aren't going to see anything but fairly ordinary <code>select</code> statements, I don't think you don't need a full-on SQL parser to do what you want. You can reasonably make the assumption that you've got syntactically valid SQL. You need to build a tokenizer (lexical analyzer), though.</p> <p>The lexical analysis needed for Transact SQL is pretty simple. The token list consist of (off the top of my head, since it's been a while since I had to do this):</p> <ul> <li>whitespace</li> <li>two types of comments: <ul> <li><code>--</code>-style comments <code></code>/<em>...</em>/`-style comments</li> </ul></li> <li>three types of quoted literals: <ul> <li>string literals (e.g., `'my string literal'), and</li> <li>two flavors of quoting reserved words for use as column or object names: <ul> <li>ANSI/ISO style, using double quotes (e.g., <code>"table"</code>)</li> <li>Transact-SQL style, using square-brackets (e.g., <code>[table]</code>)</li> </ul></li> </ul></li> <li>hex literals (e.g., <code>0x01A2F</code>)</li> <li>numeric literals (e.g. <code>757</code>, <code>-3218</code>, <code>5.4</code> or <code>-7.6E-32</code>, <code>5.0m</code> , <code>$5.3201</code> etc.)</li> <li>words, reserved or not: a unicode letter, underscore ('<em>'), 'at'-sign ('@') or hash ('#'), followed by zero or more of unicode letters, decimal digits, underscore ('</em>') or the at-, dollar- or hash- signs ('@', '$' or '#').</li> <li>operators, including parentheses.</li> </ul> <p>It can pretty much all be done with regular expressions. If you were using <a href="http://www.perl.org/" rel="nofollow">Perl</a>, you'd be done in a day, easy. It'll probably take a bit longer in C#, though.</p> <p>I would probably treat comments as whitespace and collapse multiple sequences of whitespace and comment into a single whitespace token as it facilitates the recognition of constructs such as <code>order by</code>.</p> <p>The reason you don't need a parser is that you don't really care very much about the parse tree. What you do care about is nested parentheses. So...</p> <ol> <li><p>Once you've gotten a lexical analyzer that emits a stream of tokens, all you need to do is eat and discard tokens counting open/closing parentheses until you see a 'from' keyword at parenthetical depth 0.</p></li> <li><p>Write <code>select count(*)</code> into your StringBuilder.</p></li> <li><p>Start appending tokens (including the <code>from</code>) into the StringBuilder until you see an 'order by' at parenthetical depth 0. You'll need to build a certain amount of look-ahead into your lexer to do this (which see my earlier note regarding the collapsing of sequences of whitespace and/or comments into a single whitespace token.)</p></li> <li><p>At this point, you should be pretty much done. Execute the query.</p></li> </ol> <p><strong>NOTES</strong></p> <ol> <li><p>Parameterized queries likely won't work.</p></li> <li><p>Recursive queries, with a CTE and a <code>with</code> clause will probably get broken.</p></li> <li><p>This will discard anything past the <code>ORDER BY</code> clause: if the query uses query hint, a <code>FOR</code> clause, or <code>COMPUTE</code>/<code>COMPUTE BY</code>, your results will likely differ from the original query (especially with any <code>compute</code> clauses, since those break up the queries result sets).</p></li> <li><p>Bare <code>UNION</code> queries will get broken, since something like</p> <pre><code> select c1,c2 from t1 UNION select c1,c2 from t2 </code></pre> <p>will get turned into</p> <pre><code> select count(*) from t1 UNION select c1,c2 from t2 </code></pre></li> <li><p>All this is completely untested, just my thoughts based on oddball stuff I've had to do over the years.</p></li> </ol>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload