StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POWeighted slope one algorithm? (porting from Python to R)
text
Body
copied!<p>I was reading about the <a href="http://en.wikipedia.org/wiki/Slope_One#Slope_one_collaborative_filtering_for_rated_resources" rel="nofollow noreferrer">Weighted slope one algorithm</a> ( and more formally <a href="http://www.daniel-lemire.com/fr/documents/publications/lemiremaclachlan_sdm05.pdf" rel="nofollow noreferrer">here (PDF)</a>) which is supposed to take item ratings from different users and, given a user vector containing at least 1 rating and 1 missing value, predict the missing ratings.</p> <p>I found a <a href="http://www.serpentine.com/wordpress/wp-content/uploads/2006/12/slope_one.py.txt" rel="nofollow noreferrer">Python implementation of the algorithm</a>, but I'm having a hard time porting it to <a href="http://www.r-project.org/" rel="nofollow noreferrer">R</a> (which I'm more comfortable with). Below is my attempt. Any suggestions on how to make it work?</p> <p>Thanks in advance, folks.</p> <pre><code># take a 'training' set, tr.set and a vector with some missing ratings, d pred=function(tr.set,d) { tr.set=rbind(tr.set,d) n.items=ncol(tr.set) # tally frequencies to use as weights freqs=sapply(1:n.items, function(i) { unlist(lapply(1:n.items, function(j) { sum(!(i==j)&!is.na(tr.set[,i])&!is.na(tr.set[,j])) })) }) # estimate product-by-product mean differences in ratings diffs=array(NA, dim=c(n.items,n.items)) diffs=sapply(1:n.items, function(i) { unlist(lapply(1:n.items, function(j) { diffs[j,i]=mean(tr.set[,i]-tr.set[,j],na.rm=T) })) }) # create an output vector with NAs for all the items the user has already rated pred.out=as.numeric(is.na(d)) pred.out[!is.na(d)]=NA a=which(!is.na(pred.out)) b=which(is.na(pred.out)) # calculated the weighted slope one estimate pred.out[a]=sapply(a, function(i) { sum(unlist(lapply(b,function (j) { sum((d[j]+diffs[j,i])*freqs[j,i])/rowSums(freqs)[i] }))) }) names(pred.out)=colnames(tr.set) return(pred.out) } # end function # test, using example from [3] alice=c(squid=1.0, octopus=0.2, cuttlefish=0.5, nautilus=NA) bob=c(squid=1.0, octopus=0.5, cuttlefish=NA, nautilus=0.2) carole=c(squid=0.2, octopus=1.0, cuttlefish=0.4, nautilus=0.4) dave=c(squid=NA, octopus=0.4, cuttlefish=0.9, nautilus=0.5) tr.set2=rbind(alice,bob,carole,dave) lucy2=c(squid=0.4, octopus=NA, cuttlefish=NA, nautilus=NA) pred(tr.set2,lucy2) # not correct # correct(?): {'nautilus': 0.10, 'octopus': 0.23, 'cuttlefish': 0.25} </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload