Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>This can be viewed as a <a href="http://en.wikipedia.org/wiki/Binary_classification" rel="nofollow">binary (<em>yes</em> or <em>no</em>) classification task</a>. You could write a rule-based model to classify or a statistics-based model.</p> <p>A rule-based model would be like <code>if answer in ["never", "not at this time", "nope"] then answer is "no"</code>. When spam filters first came out they contained a lot of rules like these.</p> <p>A statistics-based model would probably be more suitable here, as writing your own rules gets tiresome and does not handle new cases as well.</p> <p>For this you need to label a <a href="http://en.wikipedia.org/wiki/Training_set" rel="nofollow">training dataset</a>. After a little preprocessing (like lowercasing all the words, removing punctuation and maybe even a little stemming) you could get a dataset like</p> <pre><code>0 | never in a million years 0 | never 1 | yes sir 1 | yep 1 | yes yes yeah 0 | no way </code></pre> <p>Now you can run classification algorithms like Naive Bayes or Logistic Regression over this set (after you vectorize the words in either binary, which means is the word present or not, word count, which means the term frequency, or a tfidf float, which prevent bias to longer answers and common words) and learn which words more often belong to which class. </p> <p>In the above example <code>yes</code> would be strongly correlated to a positive answer (1) and <code>never</code> would be strongly related to a negative answer (0). You could work with n-grams so a <code>not no</code> would be treated as a single token in favor of the positive class. This is called the bag-of-words approach.</p> <p>To combat spelling errors you can add a spellchecker like Aspell to the pre-processing step. You could use a charvectorizer too, so a word like <code>nno</code> would be interpreted as <code>nn</code> and <code>no</code> and you catch errors like <code>hellyes</code> and you could trust your users to repeat spelling errors. If 5 users make the spelling error <code>neve</code> for the word <code>never</code> then the token <code>neve</code> will automatically start to count for the negative class (if labeled as such). </p> <p>You could write these algorithms yourself (Naive Bayes is doable, Paul Graham has wrote a few accessible essays on how to classify spam with Bayes Theorem and nearly every ML library has a tutorial on how to do this) or make use of libraries or programs like Scikit-Learn (MultinomialNB, SGDclassifier, LinearSVC etc.) or Vowpal Wabbit (logistic regression, quantile loss etc.).</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload