Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Instead of writing from scratch take a look at mahout.apache.org. It has the clustering algorithms you are looking for as well as the recommendation algorithms. It works alongside <a href="http://en.wikipedia.org/wiki/Apache_Hadoop" rel="nofollow">Hadoop</a>, so you can <a href="https://en.wikipedia.org/wiki/Scalability#Horizontal_and_vertical_scaling" rel="nofollow">scale it out</a> easily. </p> <p>What this will allow you to do is determine similar documents in a cluster based on your keywords and/or description of the video.</p> <p><a href="https://cwiki.apache.org/MAHOUT/k-means-clustering.html" rel="nofollow">https://cwiki.apache.org/MAHOUT/k-means-clustering.html</a></p> <p>has a quick tutorial about clustering of documents using a <a href="https://en.wikipedia.org/wiki/Reuters_Group" rel="nofollow">Reuters</a> dataset. It is quite similar to what you are trying to achieve. Mahout includes recommendation algorithms such as slope one, user based, item based and is incredibly easy to extend. It also has some pretty useful clustering algorithms which support dimension reduction features. This is useful for you in case your matrix is sparse (that is, a lot of tags that have very few usage stats).</p> <p>Also take a look at <a href="http://en.wikipedia.org/wiki/Lucene" rel="nofollow">Lucene</a> to use its tfidf features to cluster tags and documents. Also check <a href="http://en.wikipedia.org/wiki/Apache_Solr" rel="nofollow">Solr</a>. Both are Apache projects.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload