Note that there are some explanatory texts on larger screens.

plurals
  1. POHow to create Custom model using OpenNLP?
    primarykey
    data
    text
    <p>I am trying to <strong>extract entities</strong> like <strong>Names, Skills</strong> from document using <strong>OpenNLP Java API</strong>. but <strong>it is not extracting proper Names</strong>. I am using model available on <a href="http://opennlp.sourceforge.net/models-1.5/" rel="nofollow">opennlp sourceforge link</a></p> <p>Here is a piece of java code-</p> <pre><code>public class tikaOpenIntro { public static void main(String[] args) throws IOException, SAXException, TikaException { tikaOpenIntro toi = new tikaOpenIntro(); toi.filest(""); String cnt = toi.contentEx(); toi.sentenceD(cnt); toi.tokenization(cnt); String names = toi.namefind(toi.Tokens); toi.files(names); } public String Tokens[]; public String contentEx() throws IOException, SAXException, TikaException { InputStream is = new BufferedInputStream(new FileInputStream(new File( "/home/rahul/Downloads/rahul.pdf"))); // URL url=new URL("http://in.linkedin.com/in/rahulkulhari"); // InputStream is=url.openStream(); Parser ps = new AutoDetectParser(); // for detect parser related to BodyContentHandler bch = new BodyContentHandler(); ps.parse(is, bch, new Metadata(), new ParseContext()); return bch.toString(); } public void files(String st) throws IOException { FileWriter fw = new FileWriter("/home/rahul/Documents/extrdata.txt", true); BufferedWriter bufferWritter = new BufferedWriter(fw); bufferWritter.write(st + "\n"); bufferWritter.close(); } public void filest(String st) throws IOException { FileWriter fw = new FileWriter("/home/rahul/Documents/extrdata.txt", false); BufferedWriter bufferWritter = new BufferedWriter(fw); bufferWritter.write(st); bufferWritter.close(); } public String namefind(String cnt[]) { InputStream is; TokenNameFinderModel tnf; NameFinderME nf; String sd = ""; try { is = new FileInputStream( "/home/rahul/opennlp/model/en-ner-person.bin"); tnf = new TokenNameFinderModel(is); nf = new NameFinderME(tnf); Span sp[] = nf.find(cnt); String a[] = Span.spansToStrings(sp, cnt); StringBuilder fd = new StringBuilder(); int l = a.length; for (int j = 0; j &lt; l; j++) { fd = fd.append(a[j] + "\n"); } sd = fd.toString(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (InvalidFormatException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } return sd; } public void sentenceD(String content) { String cnt[] = null; InputStream om; SentenceModel sm; SentenceDetectorME sdm; try { om = new FileInputStream("/home/rahul/opennlp/model/en-sent.bin"); sm = new SentenceModel(om); sdm = new SentenceDetectorME(sm); cnt = sdm.sentDetect(content); } catch (IOException e) { e.printStackTrace(); } } public void tokenization(String tokens) { InputStream is; TokenizerModel tm; try { is = new FileInputStream("/home/rahul/opennlp/model/en-token.bin"); tm = new TokenizerModel(is); Tokenizer tz = new TokenizerME(tm); Tokens = tz.tokenize(tokens); // System.out.println(Tokens[1]); } catch (IOException e) { e.printStackTrace(); } } } </code></pre> <p>what am i trying to do is : </p> <ul> <li>i am using <strong>Apache Tika</strong> to convert PDF document into plain text document.</li> <li>I am passing plain text document for <strong>sentence boundary detection.</strong></li> <li>After this <strong>tokenization</strong></li> <li>after this <strong>Name entity extraction</strong></li> </ul> <p>But it is extracting names and other words. <strong>It is not extract proper names.</strong> and <strong>how to create Custom model to extract Skills from document like Swimming, Programming etc?</strong> </p> <p><strong>Give me some idea!</strong> </p> <p><strong>Any help will be greatly appreciated!?</strong></p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload