Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Most "beginner" projects aim at reimplementing well known algorithms, so the beginner can learn by verifying their results against known solutions. For this, I'd recommend something simple, like an email spam filter. You'd start by creating a training file, i.e. copy the text of several <em>real</em> emails into a CSV file and manually label them spam or not spam, like:</p> <pre><code>text|is_spam hi bob! how are you?|0 what time are you coming over|0 how to buy viagra now!|1 </code></pre> <p>Next, you'd create a test file, in the identical format to the training file, but obviously with different examples.</p> <p>Then, you'd create your classifier/spam filter. There are many different ways to implement a spam filter, but the most basic is by simply counting the frequency with which a word appears with is_spam=0 and is_spam=1. For example, based on the training file above, the word "viagra" is associated with 1 spam classification, but 0 non-spam classifications, so it's likely future emails containing the word "viagra" will also be classified as spam. Similarly, the word "how" appears in 1 spam and 1 non-spam email, so it's less likely to indicate a definitive classification.</p> <p>You'd then train your classifier on the training file, and calculate it's accuracy by running it on the test file.</p> <p>If the above method is too simple, you can increase its complexity by counting n-grams (groups of words), or even grammatical structure by first tagging the part-of-speech (e.g. lots of spam is usually random garbage populated with keywords, where non-spam usually makes some sense). You could potentially implement several different classifiers, and compare their accuracy.</p> <p>Granted, there's a bit more to it than that, but these methods are well documented on the internet, and it's your project so it's up to you to research it further. Good luck.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload