Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Here's some (old) sample code I threw together, with modernized code to follow:</p> <pre><code>package opennlp; import opennlp.tools.cmdline.PerformanceMonitor; import opennlp.tools.cmdline.postag.POSModelLoader; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSSample; import opennlp.tools.postag.POSTaggerME; import opennlp.tools.tokenize.WhitespaceTokenizer; import opennlp.tools.util.ObjectStream; import opennlp.tools.util.PlainTextByLineStream; import java.io.File; import java.io.IOException; import java.io.StringReader; public class OpenNlpTest { public static void main(String[] args) throws IOException { POSModel model = new POSModelLoader().load(new File("en-pos-maxent.bin")); PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent"); POSTaggerME tagger = new POSTaggerME(model); String input = "Can anyone help me dig through OpenNLP's horrible documentation?"; ObjectStream&lt;String&gt; lineStream = new PlainTextByLineStream(new StringReader(input)); perfMon.start(); String line; while ((line = lineStream.read()) != null) { String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line); String[] tags = tagger.tag(whitespaceTokenizerLine); POSSample sample = new POSSample(whitespaceTokenizerLine, tags); System.out.println(sample.toString()); perfMon.incrementCounter(); } perfMon.stopAndPrintFinalResult(); } } </code></pre> <p>The output is:</p> <pre><code>Loading POS Tagger model ... done (2.045s) Can_MD anyone_NN help_VB me_PRP dig_VB through_IN OpenNLP's_NNP horrible_JJ documentation?_NN Average: 76.9 sent/s Total: 1 sent Runtime: 0.013s </code></pre> <p>This is basically working from the POSTaggerTool class included as part of OpenNLP. The <code>sample.getTags()</code> is a <code>String</code> array that has the tag types themselves.</p> <p>This requires direct file access to the training data, which is really, really lame.</p> <p>An updated codebase for this is a little different (and probably more useful.)</p> <p>First, a Maven POM:</p> <pre><code>&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"&gt; &lt;modelVersion&gt;4.0.0&lt;/modelVersion&gt; &lt;groupId&gt;org.javachannel&lt;/groupId&gt; &lt;artifactId&gt;opennlp-example&lt;/artifactId&gt; &lt;version&gt;1.0-SNAPSHOT&lt;/version&gt; &lt;dependencies&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.opennlp&lt;/groupId&gt; &lt;artifactId&gt;opennlp-tools&lt;/artifactId&gt; &lt;version&gt;1.6.0&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.testng&lt;/groupId&gt; &lt;artifactId&gt;testng&lt;/artifactId&gt; &lt;version&gt;[6.8.21,)&lt;/version&gt; &lt;scope&gt;test&lt;/scope&gt; &lt;/dependency&gt; &lt;/dependencies&gt; &lt;build&gt; &lt;plugins&gt; &lt;plugin&gt; &lt;groupId&gt;org.apache.maven.plugins&lt;/groupId&gt; &lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt; &lt;version&gt;3.1&lt;/version&gt; &lt;configuration&gt; &lt;source&gt;1.8&lt;/source&gt; &lt;target&gt;1.8&lt;/target&gt; &lt;/configuration&gt; &lt;/plugin&gt; &lt;/plugins&gt; &lt;/build&gt; &lt;/project&gt; </code></pre> <p>And here's the code, written as a test, therefore located in <code>./src/test/java/org/javachannel/opennlp/example</code>:</p> <pre><code>package org.javachannel.opennlp.example; import opennlp.tools.cmdline.PerformanceMonitor; import opennlp.tools.postag.POSModel; import opennlp.tools.postag.POSSample; import opennlp.tools.postag.POSTaggerME; import opennlp.tools.tokenize.WhitespaceTokenizer; import org.testng.annotations.DataProvider; import org.testng.annotations.Test; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.net.URL; import java.nio.channels.Channels; import java.nio.channels.ReadableByteChannel; import java.util.stream.Stream; public class POSTest { private void download(String url, File destination) throws IOException { URL website = new URL(url); ReadableByteChannel rbc = Channels.newChannel(website.openStream()); FileOutputStream fos = new FileOutputStream(destination); fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE); } @DataProvider Object[][] getCorpusData() { return new Object[][][]{{{ "Can anyone help me dig through OpenNLP's horrible documentation?" }}}; } @Test(dataProvider = "getCorpusData") public void showPOS(Object[] input) throws IOException { File modelFile = new File("en-pos-maxent.bin"); if (!modelFile.exists()) { System.out.println("Downloading model."); download("http://opennlp.sourceforge.net/models-1.5/en-pos-maxent.bin", modelFile); } POSModel model = new POSModel(modelFile); PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent"); POSTaggerME tagger = new POSTaggerME(model); perfMon.start(); Stream.of(input).map(line -&gt; { String whitespaceTokenizerLine[] = WhitespaceTokenizer.INSTANCE.tokenize(line.toString()); String[] tags = tagger.tag(whitespaceTokenizerLine); POSSample sample = new POSSample(whitespaceTokenizerLine, tags); perfMon.incrementCounter(); return sample.toString(); }).forEach(System.out::println); perfMon.stopAndPrintFinalResult(); } } </code></pre> <p>This code doesn't actually <em>test</em> anything - it's a smoke test, if anything - but it should serve as a starting point. Another (potentially) nice thing is that it downloads a model for you if you don't have it downloaded already.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload