Note that there are some explanatory texts on larger screens.

plurals
  1. PONaive Bayes Computation in Perl / Moose
    primarykey
    data
    text
    <p>Here is some code I wrote to calculate the probability of labels with respect to some observed features using a Naive Bayes classifier. This is intended to compute the Naive Bayes formula without smoothing, and is intended to calculate the actual probabilities, so do use the usually omitted denominator. The problem I have is that for the example (below) the probability of a "good" label is > 1. (1.30612245) Can anyone help me understand what thats about? Is this a byproduct of the Naive assumption?</p> <pre><code>package NaiveBayes; use Moose; has class_counts =&gt; (is =&gt; 'ro', isa =&gt; 'HashRef[Int]', default =&gt; sub {{}}); has class_feature_counts =&gt; (is =&gt; 'ro', isa =&gt; 'HashRef[HashRef[HashRef[Num]]]', default =&gt; sub {{}}); has feature_counts =&gt; (is =&gt; 'ro', isa =&gt; 'HashRef[HashRef[Num]]', default =&gt; sub {{}}); has total_observations =&gt; (is =&gt; 'rw', isa =&gt; 'Num'); sub insert { my( $self, $class, $data ) = @_; $self-&gt;class_counts-&gt;{$class}++; $self-&gt;total_observations( ($self-&gt;total_observations||0) + 1 ); for( keys %$data ){ $self-&gt;feature_counts-&gt;{$_}-&gt;{$data-&gt;{$_}}++; $self-&gt;class_feature_counts-&gt;{$_}-&gt;{$class}-&gt;{$data-&gt;{$_}}++; } return $self; } sub classify { my( $self, $data ) = @_; my %probabilities; my $feature_probability = 1; for my $class( keys %{ $self-&gt;class_counts } ) { my $class_count = $self-&gt;class_counts-&gt;{$class}; my $class_probability = $class_count / $self-&gt;total_observations; my($feature_probability, $conditional_probability) = (1) x 2; my( @feature_probabilities, @conditional_probabilities ); for( keys %$data ){ my $feature_count = $self-&gt;feature_counts-&gt;{$_}-&gt;{$data-&gt;{$_}}; my $class_feature_count = $self-&gt;class_feature_counts-&gt;{$_}-&gt;{$class}-&gt;{$data-&gt;{$_}} || 0; next unless $feature_count; $feature_probability *= $feature_count / $self-&gt;total_observations; $conditional_probability *= $class_feature_count / $class_count; } $probabilities{$class} = $class_probability * $conditional_probability / $feature_probability; } return %probabilities; } __PACKAGE__-&gt;meta-&gt;make_immutable; 1; </code></pre> <p>Example:</p> <pre><code>#!/usr/bin/env perl use Moose; use NaiveBayes; my $nb = NaiveBayes-&gt;new; $nb-&gt;insert('good' , {browser =&gt; 'chrome' ,host =&gt; 'yahoo' ,country =&gt; 'us'}); $nb-&gt;insert('bad' , {browser =&gt; 'chrome' ,host =&gt; 'slashdot' ,country =&gt; 'us'}); $nb-&gt;insert('good' , {browser =&gt; 'chrome' ,host =&gt; 'slashdot' ,country =&gt; 'uk'}); $nb-&gt;insert('good' , {browser =&gt; 'explorer' ,host =&gt; 'google' ,country =&gt; 'us'}); $nb-&gt;insert('good' , {browser =&gt; 'explorer' ,host =&gt; 'slashdot' ,country =&gt; 'ca'}); $nb-&gt;insert('good' , {browser =&gt; 'opera' ,host =&gt; 'google' ,country =&gt; 'ca'}); $nb-&gt;insert('good' , {browser =&gt; 'firefox' ,host =&gt; '4chan' ,country =&gt; 'us'}); $nb-&gt;insert('good' , {browser =&gt; 'opera' ,host =&gt; '4chan' ,country =&gt; 'ca'}); my %classes = $nb-&gt;classify({browser =&gt; 'opera', host =&gt; '4chan', country =&gt;'uk'}); my @classes = sort { $classes{$a} &lt;=&gt; $classes{$b} } keys %classes; for( @classes ){ printf( "%-20s : %5.8f\n", $_, $classes{$_} ); } </code></pre> <p>Prints:</p> <pre><code>bad : 0.00000000 good : 1.30612245 </code></pre> <p>Im less worried about the 0 probability, but more that the "probability" of good > 1. I believe this is the implementation of the classic Naive Bayes definition.</p> <pre><code>p(C│F_1 ...F_n )=(p(C)p(F_1 |C)...p(F_n |C))/(p(F_1)...p(F_n)) </code></pre> <p>How can this be > 1?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload