Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>This is exactly the same problem Twitter is battling with. You might end up with a job there if you crack this ;)</p> <p>On serious note coming back, one could use some crude measures (i.e. heuristic based) to do something like this, but it has a big error percentage. As delnan said in the comment.</p> <p>NLP is a sure bet. Note that using NLP too has some error %, but it's far more accurate than any heuristic you would use. If you are using python I would suggest this toolkit, I use it now and then - <a href="http://www.nltk.org/" rel="nofollow"><strong>NLP</strong></a>. </p> <p>For other languages I am sure there are packages which will help you in this regard. </p> <p><strong>UPDATE1:</strong> If you have a way for the users to tag their messages (like stackoverflow does), you could approach this problem barring NLP. Then you could simply take the intersection of the tags of both the messages to see if there is any commonality &amp; suggest some top items for the common items. </p> <p>But there are other issues you'll have to deal with - make tags a mandatory, plus you need to be sure that the users are actually entering correct tags etc... But nevertheless this greatly simplifies your problem.</p> <p><strong>UPDATE2:</strong> As the Q has been updated - Since you have some specific keywords/phrases only which you are interested in. This kind of simplifies it. You would need to get each of your message, split it into words, then <a href="http://en.wikipedia.org/wiki/Stemming" rel="nofollow"><strong>stem</strong></a> each word. After stemming, intersect this set with the set of keywords you have. You'll get a set(S1). Do the same with the second message, you'll get a set(S2). Intersect S1, S2. If you find something is common, bingo! Some theme is common between message1, message2. else nothing.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload