Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimizing Jaro-Winkler algorithm
    primarykey
    data
    text
    <p>I have this code for Jaro-Winkler algorithm taken from <a href="http://www.dcs.shef.ac.uk/~sam/stringmetrics.html#jaro" rel="nofollow noreferrer">this</a> website. I need to run 150,000 times to get distance between differences. It takes a long time, as I run on an Android mobile device.</p> <p>Can it be optimized more?</p> <pre><code>public class Jaro { /** * gets the similarity of the two strings using Jaro distance. * * @param string1 the first input string * @param string2 the second input string * @return a value between 0-1 of the similarity */ public float getSimilarity(final String string1, final String string2) { //get half the length of the string rounded up - (this is the distance used for acceptable transpositions) final int halflen = ((Math.min(string1.length(), string2.length())) / 2) + ((Math.min(string1.length(), string2.length())) % 2); //get common characters final StringBuffer common1 = getCommonCharacters(string1, string2, halflen); final StringBuffer common2 = getCommonCharacters(string2, string1, halflen); //check for zero in common if (common1.length() == 0 || common2.length() == 0) { return 0.0f; } //check for same length common strings returning 0.0f is not the same if (common1.length() != common2.length()) { return 0.0f; } //get the number of transpositions int transpositions = 0; int n=common1.length(); for (int i = 0; i &lt; n; i++) { if (common1.charAt(i) != common2.charAt(i)) transpositions++; } transpositions /= 2.0f; //calculate jaro metric return (common1.length() / ((float) string1.length()) + common2.length() / ((float) string2.length()) + (common1.length() - transpositions) / ((float) common1.length())) / 3.0f; } /** * returns a string buffer of characters from string1 within string2 if they are of a given * distance seperation from the position in string1. * * @param string1 * @param string2 * @param distanceSep * @return a string buffer of characters from string1 within string2 if they are of a given * distance seperation from the position in string1 */ private static StringBuffer getCommonCharacters(final String string1, final String string2, final int distanceSep) { //create a return buffer of characters final StringBuffer returnCommons = new StringBuffer(); //create a copy of string2 for processing final StringBuffer copy = new StringBuffer(string2); //iterate over string1 int n=string1.length(); int m=string2.length(); for (int i = 0; i &lt; n; i++) { final char ch = string1.charAt(i); //set boolean for quick loop exit if found boolean foundIt = false; //compare char with range of characters to either side for (int j = Math.max(0, i - distanceSep); !foundIt &amp;&amp; j &lt; Math.min(i + distanceSep, m - 1); j++) { //check if found if (copy.charAt(j) == ch) { foundIt = true; //append character found returnCommons.append(ch); //alter copied string2 for processing copy.setCharAt(j, (char)0); } } } return returnCommons; } } </code></pre> <p>I mention that in the whole process I make just instance of the script, so only once</p> <pre><code>jaro= new Jaro(); </code></pre> <p>If you are going to test and need examples so not break the script, you will find it <a href="https://stackoverflow.com/questions/2741872/python-performance-improvement-request-for-winkler">here</a>, in another thread for python optimization</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload