Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to rank items that match keywords by similarity
    text
    copied!<p>I have a table of items linked to a table of keywords. I want to list all items that share keywords and I want to rank the results so that items that are most similar appear highest. So, I figure items that match all keywords will come highest with items that match only one keyword appearing lower in the list. </p> <p>Where items match the same number of keywords I want to then rank the results by the frequency of a keyword, so that items that share an unusual keyword will appear higher than items that share a keyword that appears frequently.</p> <p>Currently my schema is like this (but this could be changed if necessary):</p> <pre><code>Item ItemKeywords Keywords ____ ____________ ________ ID ItemID ID Item KeywordID Word </code></pre> <p>I'm struggling to even start with the sql to get the items that match most keywords. Can anyone help? I'm using Sql Server 2005</p> <p><strong>EDIT - (added for completeness)</strong></p> <p>Using Beth's SQL as a pointer I've come up with the following, which I think is working as I want, just got to do a little more testing:</p> <pre><code> SELECT IK1.ItemID, IK2.ItemID, COUNT(IK2.KeywordID) AS KeywordCount, MIN(WordFrequency) AS WordFrequency FROM ItemKeywords IK1 INNER JOIN ItemKeywords IK2 ON IK1.KeywordID = IK2.KeywordID --Keywords match AND IK1.ItemID &lt; IK2.ItemID --Eliminate pairs (Bill &amp; Ted but not Ted and Bill or Bill and Bill) INNER JOIN (SELECT KeywordID, COUNT(*) WordFrequency FROM dbo.ItemKeywords GROUP BY KeywordID) AS KF ON KF.KeywordID = IK2.KeywordID --Join to sub-query to get frequency of word GROUP BY IK1.ItemID, IK2.ItemID ORDER BY COUNT(IK2.KeywordID) DESC, MIN(WordFrequency) ASC --Order by most matching keywords then by most unusual word </code></pre> <p>I've altered the syntax slightly to avoid the sub-query (doing everything on the join), I've changed <code>IK1.ItemID &lt;&gt; IK2.ItemID</code> to <code>IK1.ItemID &lt; IK2.ItemID</code> to eliminate pairs, and I've added the second order by item to take into account the frequency that a word is used too. But essentially Beth answered the question so I will mark it as the answer (many thanks)</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload