Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p><a href="https://github.com/github/linguist" rel="nofollow">Linguist</a> might do that for you (it's what GitHub uses to detect the primary languages in a project).</p> <p>If you're looking to build your own, that would be a good place to start. Here are a few more notes on what else you might have to do in order to make one.</p> <p>File extensions are a good cheat. For example:</p> <ul> <li><code>.rb</code> - almost always ruby</li> <li><code>.cpp</code> - almost always C++</li> <li><code>.h</code> - could be C/C++</li> </ul> <p>...etc., then read the code line by line. There are usually common key words, or the placement of those words within the code that will tip you off pretty quickly as to what language it's written in. A review of several "getting started" tutorial web sites for the languages that you want to support should give you a good summary of these things, without needing to actually learn the languages themselves. All you really need is a few unique things to each language that you can pick up on that makes a file definitively one language or another.</p> <p>You could also use a Bayesian learning filter (there is a module called <a href="http://classifier.rubyforge.org/" rel="nofollow">Classifier</a> in Ruby that appears to do this) to train a more flexible learning engine to identify code by language on its own. Since programming languages are highly structured text, it shouldn't take very long for your learning software to get extremely good at identifying the language. If you wanted to go totally crazy, you could even train it to identify not only the language, but the minimum version of the language that the code can be compiled against. For example, in Java, they added generics at a particular point in the language's life cycle. If you see the use of generics in the code, then you know that the source was written for a certain minimum version of Java, etc.</p> <p>A little more complex, but not much, will be questions like <code>.erb</code> files. Do you call those "Embedded Ruby", do you call them "Ruby", or do you count the lines of HTML vs. Ruby vs. JavaScript, and call it by the most numerous language, or do you just tag the file with ALL the found languages? I suppose that's really more of a design decision.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload