Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>One program I know which even can distinguish several different languages within the same file is <a href="http://labs.ohloh.net/ohcount" rel="nofollow noreferrer">ohcount</a>. You might get some ideas there, although I don't really know how they do it.</p> <p>In general you can look for distinctive patterns:</p> <ul> <li>Operators might be an indicator, such as <code>:=</code> for Pascal/Modula/Oberon, <code>=&gt;</code> or the whole of LINQ in C#</li> <li>Keywords would be another one as probably no two languages have the same set of keywords</li> <li>Casing rules for identifiers, assuming the piece of code was writting conforming to best practices. Probably a very weak rule</li> <li>Standard library functions or types. Especially for languages that usually rely heavily on them, such as PHP you might just use a long list of standard library functions.</li> </ul> <p>You may create a set of rules, each of which indicates a possible set of languages if it matches. Intersecting the resulting lists will hopefully get you only one language.</p> <p>The problem with this approach however, is that you need to do tokenizing and compare tokens (otherwise you can't really know what operators are or whether something you found was inside a comment or string). Tokenizing rules are different for each language as well, though; just splitting everything at whitespace and punctuation will probably not yield a very useful sequence of tokens. You can try several different tokenizing rules (each of which would indicate a certain set of languages as well) and have your rules match to a specified tokenization. For example, trying to find a single-quoted string (for trying out Pascal) in a VB snippet with one comment will probably fail, but another tokenizer might have more luck.</p> <p>But since you want to perform analysis anyway you probably have parsers for the languages you support, so you can just try running the snippet through each parser and take that as indicator which language it would be (as suggested by OregonGhost as well).</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload