Note that there are some explanatory texts on larger screens.

plurals
  1. POOptimisation of recursive algorithm in Java
    primarykey
    data
    text
    <h2>Background</h2> <p>I have an ordered set of data points stored as a <code>TreeSet&lt;DataPoint&gt;</code>. Each data point has a <code>position</code> and a <code>Set</code> of <code>Event</code> objects (<code>HashSet&lt;Event&gt;</code>).</p> <p>There are 4 possible <code>Event</code> objects <code>A</code>, <code>B</code>, <code>C</code>, and <code>D</code>. Every <code>DataPoint</code> has 2 of these, e.g. <code>A</code> and <code>C</code>, except the first and last <code>DataPoint</code> objects in the set, which have <code>T</code> of size 1.</p> <p>My algorithm is to find the probability of a new <code>DataPoint</code> <code>Q</code> at position <code>x</code> having <code>Event</code> <code>q</code> in this set.</p> <p>I do this by calculating a value <code>S</code> for this data set, then adding <code>Q</code> to the set and calculating <code>S</code> again. I then divide the second <code>S</code> by the first to isolate the probability for the new <code>DataPoint</code> <code>Q</code>.</p> <h2>Algorithm</h2> <p>The formula for calculating <code>S</code> is: </p> <p><a href="http://mathbin.net/equations/105225_0.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_0.png</a></p> <p>where </p> <p><a href="http://mathbin.net/equations/105225_1.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_1.png</a></p> <p><a href="http://mathbin.net/equations/105225_2.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_2.png</a></p> <p>for <a href="http://mathbin.net/equations/105225_3.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_3.png</a></p> <p>and</p> <p><a href="http://mathbin.net/equations/105225_4.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_4.png</a></p> <p><a href="http://mathbin.net/equations/105225_5.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_5.png</a> is an expensive probability function that only depends on its arguments and nothing else (and <a href="http://mathbin.net/equations/105225_6.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_6.png</a>), <a href="http://mathbin.net/equations/105225_7.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_7.png</a> is the last <code>DataPoint</code> in the set (righthand node), <a href="http://mathbin.net/equations/105225_8.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_8.png</a> is the first <code>DataPoint</code> (lefthand node), <a href="http://mathbin.net/equations/105225_9.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_9.png</a> is the rightmost <code>DataPoint</code> that isn't the node, <a href="http://mathbin.net/equations/105225_10.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_10.png</a> is a <code>DataPoint</code>,<a href="http://mathbin.net/equations/105225_12.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_12.png</a> is the <code>Set</code> of events for this <code>DataPoint</code>.</p> <p>So the probability for <code>Q</code> with <code>Event</code> <code>q</code> is:</p> <p><a href="http://mathbin.net/equations/105225_11.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_11.png</a></p> <h2>Implementation</h2> <p>I implemented this algorithm in Java like so:</p> <pre><code>public class ProbabilityCalculator { private Double p(DataPoint right, Event rightEvent, DataPoint left, Event leftEvent) { // do some stuff } private Double f(DataPoint right, Event rightEvent, NavigableSet&lt;DataPoint&gt; points) { DataPoint left = points.lower(right); Double result = 0.0; if(left.isLefthandNode()) { result = 0.25 * p(right, rightEvent, left, null); } else if(left.isQ()) { result = p(right, rightEvent, left, left.getQEvent()) * f(left, left.getQEvent(), points); } else { // if M_k for(Event leftEvent : left.getEvents()) result += p(right, rightEvent, left, leftEvent) * f(left, leftEvent, points); } return result; } public Double S(NavigableSet&lt;DataPoint&gt; points) { return f(points.last(), points.last().getRightNodeEvent(), points) } } </code></pre> <p>So to find the probability of <code>Q</code> at <code>x</code> with <code>q</code>:</p> <pre><code>Double S1 = S(points); points.add(Q); Double S2 = S(points); Double probability = S2/S1; </code></pre> <h2>Problem</h2> <p>As the implementation stands at the moment it follows the mathematical algorithm closely. However this turns out not to be a particularly good idea in practice, as <code>f</code> calls itself twice for each <code>DataPoint</code>. So for <a href="http://mathbin.net/equations/105225_9.png" rel="nofollow noreferrer">http://mathbin.net/equations/105225_9.png</a>, <code>f</code> is called twice, then for the <code>n-1</code> <code>f</code> is called twice again for each of the previous calls, and so on and so forth. This leads to a complexity of <code>O(2^n)</code> which is pretty terrible considering there can be over 1000 <code>DataPoints</code> in each <code>Set</code>. Because <code>p()</code> is independent of everything except its parameters I have included a caching function where if <code>p()</code> has already been calculated for these parameters it just returns the previous result, but this doesn't solve the inherent complexity problem. Am I missing something here with regards to repeat computations, or is the complexity unavoidable in this algorithm?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload