Note that there are some explanatory texts on larger screens.

plurals
  1. POAuto Categorization of Content
    primarykey
    data
    text
    <p>I'm developing a script that extracts the messages from the message archive of a particular meetup.com group of which I'm a member - <a href="http://www.meetup.com/opencoffee/messages/archive/" rel="nofollow noreferrer">http://www.meetup.com/opencoffee/messages/archive/</a></p> <p>The idea is to dynamically add these to a wordpress site and allow people to search messages, auto tag messages etc.</p> <p>The issue I have is how best to auto categorize these messages. I would welcome any thoughts and ideas of how best to go about this and what would be the most efficient way of programming this.</p> <p>Option 1</p> <p>Find a source of tags by subject area such as finance, technology, business etc by using the delicious API and find related tags by subject:-</p> <p><a href="http://delicious.com/tag/finance" rel="nofollow noreferrer">http://delicious.com/tag/finance</a></p> <p><a href="http://delicious.com/tag/technology" rel="nofollow noreferrer">http://delicious.com/tag/technology</a></p> <p>if a message contains these tags then the message is assigned to the respective category.</p> <p>I believe this could work but not sure the most efficient method of scanning the message for these tags.</p> <p>Option 2</p> <p>Find sites that are representative of the categories I need such as ft.com, the economist for finance etc, techcrunch for technology etc and then determine what tags are being used by people to tag these sites and determine by default that those tags are how people relate to these sites and their content stack.</p> <p>Option 3</p> <p>Pass the message url to <a href="http://semanticproxy.com/" rel="nofollow noreferrer">http://semanticproxy.com/</a> (part of Reuters Calais project) or use the Open Calais API. This I have tried but without much success as the variable depth of content is not always sufficient to return meaningful taxonomy.</p> <p>Here is an example message that I parsed through the calais api:-</p> <p>Original Message</p> <p><a href="http://www.meetup.com/opencoffee/messages/6045615/" rel="nofollow noreferrer">http://www.meetup.com/opencoffee/messages/6045615/</a></p> <p>Calais Result</p> <p><a href="http://www.mashinteractive.com/opencoffee/calais.php" rel="nofollow noreferrer">http://www.mashinteractive.com/opencoffee/calais.php</a></p> <p>SUMMARY</p> <p>So That's about it. I would welcome any thoughts and ideas on methodology and tips on how best to approach the message scanning for options 1 and 2. </p> <p>FYI there are approximately, 1,700 messages to date and I'm guessing I may have 10 categories with each category being defined by 20 or 30 tags.</p> <p>If anyone would like to help develop a Wordpress plugin or class to do this I would be more than happy to have you on board. Bear in mind I'm not a programmer, I just tinker around the edges and pretend I am one.</p> <p>Thanks in advance</p> <p>Jonathan CEO</p> <p>Crowd People</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload