Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p><strong>Existing Implementations of Naive Bayes</strong></p> <p>You would probably be better off just using one of the existing packages that supports document classification using naive Bayes, e.g.:</p> <p><strong>Python</strong> - To do this using the Python based <strong><a href="http://www.nltk.org/" rel="noreferrer">Natural Language Toolkit (NLTK)</a></strong>, see the <a href="http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html#document-classification" rel="noreferrer"><strong>Document Classification</strong></a> section in the freely available <a href="http://www.nltk.org/book" rel="noreferrer">NLTK book</a>.</p> <p><strong>Ruby</strong> - If Ruby is more of your thing, you can use the <strong><a href="http://classifier.rubyforge.org/" rel="noreferrer">Classifier</a></strong> gem. Here's sample code that detects <a href="http://www.igvita.com/2007/05/23/bayes-classification-in-ruby/" rel="noreferrer"> whether Family Guy quotes are funny or not-funny</a>.</p> <p><strong>Perl</strong> - Perl has the <a href="http://search.cpan.org/dist/Algorithm-NaiveBayes/lib/Algorithm/NaiveBayes.pm" rel="noreferrer"><strong>Algorithm::NaiveBayes</strong></a> module, complete with a sample usage snippet in the package <a href="http://search.cpan.org/dist/Algorithm-NaiveBayes/lib/Algorithm/NaiveBayes.pm#SYNOPSIS" rel="noreferrer">synopsis</a>. </p> <p><strong>C#</strong> - C# programmers can use <strong><a href="http://nbayes.codeplex.com/" rel="noreferrer">nBayes</a></strong>. The project's home page has sample code for a simple spam/not-spam classifier.</p> <p><strong>Java</strong> - Java folks have <strong><a href="http://classifier4j.sourceforge.net/" rel="noreferrer">Classifier4J</a></strong>. You can see a training and scoring code snippet <a href="http://classifier4j.sourceforge.net/usage.html#Using_BayesianClassifier" rel="noreferrer">here</a>. </p> <p><strong>Bootstrapping Classification from Keywords</strong></p> <p>It sounds like you want to start with a set of keywords that are <strong>known to cue for certain topics</strong> and then use those keywords to <a href="http://en.wikipedia.org/wiki/Bootstrapping_%28machine_learning%29" rel="noreferrer"><strong>bootstrap a classifier</strong></a>. </p> <p>This is a reasonably clever idea. Take a look at the paper <a href="http://www.kamalnigam.com/papers/keywordcat-aclws99.pdf" rel="noreferrer"><strong>Text Classication by Bootstrapping with Keywords, EM and Shrinkage</strong></a> by McCallum and Nigam (1999). By following this approach, they were able to improve classification accuracy from the 45% they got by using hard-coded keywords alone to 66% using a bootstrapped Naive Bayes classifier. For their data, the latter is close to human levels of agreement, as people agreed with each other about document labels 72% of the time.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload