StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POHow to extract relationship from text in NLTK
text
Body
copied!<p>Hi I'm trying to extract relationships from a string of text based on the second last example here: <a href="https://web.archive.org/web/20120907184244/http://nltk.googlecode.com/svn/trunk/doc/howto/relextract.html" rel="nofollow">https://web.archive.org/web/20120907184244/http://nltk.googlecode.com/svn/trunk/doc/howto/relextract.html</a></p> <p>From a string such as "Michael James editor of Publishers Weekly" my desired result is to have an output such as:</p> <blockquote> <p>[PER: 'Michael James'] ', editor of' [ORG: 'Publishers Weekly']</p> </blockquote> <p>What is the best way to do do this? What format does extract_rels expect and how do I format my input to meet that requirement?</p> <hr> <p>Tried to do it myself but it didn't work. Here is the code I've adapted from the book. I'm not getting any results printed. What am I doing wrong?</p> <pre><code>class doc(): pass doc.headline = ['this is expected by nltk.sem.extract_rels but not used in this script'] def findrelations(text): roles = """ (.*( analyst| editor| librarian).*)| researcher| spokes(wo)?man| writer| ,\sof\sthe?\s* # "X, of (the) Y" """ ROLES = re.compile(roles, re.VERBOSE) tokenizedsentences = nltk.sent_tokenize(text) for sentence in tokenizedsentences: taggedwords = nltk.pos_tag(nltk.word_tokenize(sentence)) doc.text = nltk.batch_ne_chunk(taggedwords) print doc.text for rel in relextract.extract_rels('PER', 'ORG', doc, corpus='ieer', pattern=ROLES): print relextract.show_raw_rtuple(rel) # doctest: +ELLIPSIS </code></pre> <blockquote> <p>text ="Michael James editor of Publishers Weekly"</p> <p>findrelations(text)</p> </blockquote>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload