Note that there are some explanatory texts on larger screens.

plurals
  1. POClassification of instances in Weka
    primarykey
    data
    text
    <p>I'm trying to use Weka in my C# application. I've used IKVM to bring the Java parts into my .NET application. This seems to be working quite well. However, I am at a loss when it comes to Weka's API. How exactly <em>do</em> I classify instances if they are programmatically passed around in my application and not available as ARFF files.</p> <p>Basically, I am trying to integrate a simple co-reference analysis using Weka's classifiers. I've built the classification model in Weka directly and saved it to disk, from where my .NET application opens it and uses the IKVM port of Weka to predict the class value.</p> <p>Here is what I've got so far:</p> <pre><code> // This is the "entry" method for the classification method public IEnumerable&lt;AttributedTokenDecorator&gt; Execute(IEnumerable&lt;TokenPair&gt; items) { TokenPair[] pairs = items.ToArray(); Classifier model = ReadModel(); // reads the Weka generated model FastVector fv = CreateFastVector(pairs); Instances instances = new Instances("licora", fv, pairs.Length); CreateInstances(instances, pairs); for(int i = 0; i &lt; instances.numInstances(); i++) { Instance instance = instances.instance(i); double classification = model.classifyInstance(instance); // array index out of bounds? if(AsBoolean(classification)) MakeCoreferent(pairs[i]); } throw new NotImplementedException(); // TODO } // This is a helper method to create instances from the internal model files private static void CreateInstances(Instances instances, IEnumerable&lt;TokenPair&gt; pairs) { instances.setClassIndex(instances.numAttributes() - 1); foreach(var pair in pairs) { var instance = new Instance(instances.numAttributes()); instance.setDataset(instances); for (int i = 0; i &lt; instances.numAttributes(); i++) { var attribute = instances.attribute(i); if (pair.Features.ContainsKey(attribute.name()) &amp;&amp; pair.Features[attribute.name()] != null) { var value = pair.Features[attribute.name()]; if (attribute.isNumeric()) instance.setValue(attribute, Convert.ToDouble(value)); else instance.setValue(attribute, value.ToString()); } else { instance.setMissing(attribute); } } //instance.setClassMissing(); instances.add(instance); } } // This creates the data set's attributes vector private FastVector CreateFastVector(TokenPair[] pairs) { var fv = new FastVector(); foreach (var attribute in _features) { Attribute att; if (attribute.Type.Equals(ArffType.Nominal)) { var values = new FastVector(); ExtractValues(values, pairs, attribute.FeatureName); att = new Attribute(attribute.FeatureName, values); } else att = new Attribute(attribute.FeatureName); fv.addElement(att); } { var classValues = new FastVector(2); classValues.addElement("0"); classValues.addElement("1"); var classAttribute = new Attribute("isCoref", classValues); fv.addElement(classAttribute); } return fv; } // This extracts observed values for nominal attributes private static void ExtractValues(FastVector values, IEnumerable&lt;TokenPair&gt; pairs, string featureName) { var strings = (from x in pairs where x.Features.ContainsKey(featureName) &amp;&amp; x.Features[featureName] != null select x.Features[featureName].ToString()) .Distinct().ToArray(); foreach (var s in strings) values.addElement(s); } private Classifier ReadModel() { return (Classifier) SerializationHelper.read(_model); } private static bool AsBoolean(double classifyInstance) { return classifyInstance &gt;= 0.5; } </code></pre> <p>For some reason, Weka throws an <code>IndexOutOfRangeException</code> when I call <code>model.classifyInstance(instance)</code>. I have no idea why, nor can I come up with an idea how to rectify this issue.</p> <p>I am hoping someone might know where I went wrong. The only documentation for Weka I found relies on ARFF files for prediction, and I don't really want to go there.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload