Note that there are some explanatory texts on larger screens.

plurals
  1. POWhy is using a sequence so much slower than using a list in this example
    primarykey
    data
    text
    <p>Background: I have a sequence of contiguous, time-stamped data. The data-sequence has holes in it, some large, others just a single missing value.<br> Whenever the hole is just a single missing value, I want to patch the holes using a dummy-value (larger holes will be ignored).</p> <p>I would like to use lazy generation of the patched sequence, and I am thus using Seq.unfold.</p> <p>I have made two versions of the method to patch the holes in the data. </p> <p>The first consumes the <strong>sequence</strong> of data with holes in it and produces the patched <strong>sequence</strong>. This is what i want, but the methods runs horribly slow when the number of elements in the input sequence rises above 1000, and it gets progressively worse the more elements the input sequence contains.</p> <p>The second method consumes a <strong>list</strong> of the data with holes and produces the patched <strong>sequence</strong> and it runs fast. This is however not what I want, since this forces the instantiation of the entire input-list in memory.</p> <p>I would like to use the (sequence -> sequence) method rather than the (list -> sequence) method, to avoid having the entire input-list in memory at the same time.</p> <p>Questions:</p> <p>1) Why is the first method so slow (getting progressively worse with larger input lists) (I am suspecting that it has to do with repeatedly creating new sequences with Seq.skip 1, but I am not sure)</p> <p>2) How can I make the patching of holes in the data fast, while using an input <strong>sequence</strong> rather than an input <strong>list</strong>?</p> <p>The code:</p> <pre><code>open System // Method 1 (Slow) let insertDummyValuesWhereASingleValueIsMissing1 (timeBetweenContiguousValues : TimeSpan) (values : seq&lt;(DateTime * float)&gt;) = let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal (None, values) |&gt; Seq.unfold (fun (prevValue, restOfValues) -&gt; if restOfValues |&gt; Seq.isEmpty then None // Reached the end of the input seq else let currentValue = Seq.hd restOfValues if prevValue.IsNone then Some(currentValue, (Some(currentValue), Seq.skip 1 restOfValues )) // Only happens to the first item in the seq else let currentTime = fst currentValue let prevTime = fst prevValue.Value let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime) if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) // 42 is chosen here for obvious reasons, making this comment superfluous Some(dummyValue, (Some(dummyValue), restOfValues)) else Some(currentValue, (Some(currentValue), Seq.skip 1 restOfValues))) // Either the two values were contiguous, or the gap between them was too large to patch // Method 2 (Fast) let insertDummyValuesWhereASingleValueIsMissing2 (timeBetweenContiguousValues : TimeSpan) (values : (DateTime * float) list) = let sizeOfHolesToPatch = timeBetweenContiguousValues.Add timeBetweenContiguousValues // Only insert dummy-values when the gap is twice the normal (None, values) |&gt; Seq.unfold (fun (prevValue, restOfValues) -&gt; match restOfValues with | [] -&gt; None // Reached the end of the input list | currentValue::restOfValues -&gt; if prevValue.IsNone then Some(currentValue, (Some(currentValue), restOfValues )) // Only happens to the first item in the list else let currentTime = fst currentValue let prevTime = fst prevValue.Value let timeDiffBetweenPrevAndCurrentValue = currentTime.Subtract(prevTime) if timeDiffBetweenPrevAndCurrentValue = sizeOfHolesToPatch then let dummyValue = (prevTime.Add timeBetweenContiguousValues, 42.0) Some(dummyValue, (Some(dummyValue), currentValue::restOfValues)) else Some(currentValue, (Some(currentValue), restOfValues))) // Either the two values were contiguous, or the gap between them was too large to patch // Test data let numbers = {1.0..10000.0} let contiguousTimeStamps = seq { for n in numbers -&gt; DateTime.Now.AddMinutes(n)} let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |&gt; Seq.filter (fun (dateTime, num) -&gt; num % 77.0 &lt;&gt; 0.0) // Has a gap in the data every 77 items let timeBetweenContiguousValues = (new TimeSpan(0,1,0)) // The fast sequence-patching (method 2) dataWithOccationalHoles |&gt; List.of_seq |&gt; insertDummyValuesWhereASingleValueIsMissing2 timeBetweenContiguousValues |&gt; Seq.iter (fun pair -&gt; printfn "%f %s" (snd pair) ((fst pair).ToString())) // The SLOOOOOOW sequence-patching (method 1) dataWithOccationalHoles |&gt; insertDummyValuesWhereASingleValueIsMissing1 timeBetweenContiguousValues |&gt; Seq.iter (fun pair -&gt; printfn "%f %s" (snd pair) ((fst pair).ToString())) </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload