Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>I ran some tests and here is the code I tested:</p> <pre><code>delegate(float[] inout) { // My Original Code float[][] tempbuf = new float[2][]; int length = inout.Length / 2; for (int c = 0; c &lt; 2; c++) { tempbuf[c] = new float[length]; for (int i = 0, offset = c; i &lt; tempbuf[c].Length; i++, offset += 2) tempbuf[c][i] = inout[offset]; } } delegate(float[] inout) { // jerryjvl's recommendation: loop unrolling float[][] tempbuf = new float[2][]; int length = inout.Length / 2; for (int c = 0; c &lt; 2; c++) tempbuf[c] = new float[length]; for (int ix = 0, i = 0; ix &lt; length; ix++) { tempbuf[0][ix] = inout[i++]; tempbuf[1][ix] = inout[i++]; } } delegate(float[] inout) { // Unsafe Code unsafe { float[][] tempbuf = new float[2][]; int length = inout.Length / 2; fixed (float* buffer = inout) for (int c = 0; c &lt; 2; c++) { tempbuf[c] = new float[length]; float* offset = buffer + c; fixed (float* buffer2 = tempbuf[c]) { float* p = buffer2; for (int i = 0; i &lt; length; i++, offset += 2) *p++ = *offset; } } } } delegate(float[] inout) { // Modifying my original code to see if the compiler is not as smart as i think it is. float[][] tempbuf = new float[2][]; int length = inout.Length / 2; for (int c = 0; c &lt; 2; c++) { float[] buf = tempbuf[c] = new float[length]; for (int i = 0, offset = c; i &lt; buf.Length; i++, offset += 2) buf[i] = inout[offset]; } } </code></pre> <p>and results: (buffer size = 2^17, number iterations timed per test = 200)</p> <pre><code>Average for test #1: 0.001286 seconds +/- 0.000026 Average for test #2: 0.001193 seconds +/- 0.000025 Average for test #3: 0.000686 seconds +/- 0.000009 Average for test #4: 0.000847 seconds +/- 0.000008 Average for test #1: 0.001210 seconds +/- 0.000012 Average for test #2: 0.001048 seconds +/- 0.000012 Average for test #3: 0.000690 seconds +/- 0.000009 Average for test #4: 0.000883 seconds +/- 0.000011 Average for test #1: 0.001209 seconds +/- 0.000015 Average for test #2: 0.001060 seconds +/- 0.000013 Average for test #3: 0.000695 seconds +/- 0.000010 Average for test #4: 0.000861 seconds +/- 0.000009 </code></pre> <p>I got similar results every test. Obviously the unsafe code is the fastest, but I was surprised to see that the CLS couldn't figure out that that it can drop the index checks when dealing with jagged array. Maybe someone can think of more ways to optimize my tests.</p> <p>Edit: I tried loop unrolling with the unsafe code and it didn't have an effect. I also tried optimizing the loop unrolling method:</p> <pre><code>delegate(float[] inout) { float[][] tempbuf = new float[2][]; int length = inout.Length / 2; float[] tempbuf0 = tempbuf[0] = new float[length]; float[] tempbuf1 = tempbuf[1] = new float[length]; for (int ix = 0, i = 0; ix &lt; length; ix++) { tempbuf0[ix] = inout[i++]; tempbuf1[ix] = inout[i++]; } } </code></pre> <p>The results are also a hit-miss compared test#4 with 1% difference. Test #4 is my best way to go, so far.</p> <p>As I told jerryjvl, the problem is getting the CLS to not index check the input buffer, since adding a second check (&amp;&amp; offset &lt; inout.Length) will slow it down...</p> <p>Edit 2: I ran the tests before in the IDE, so here are the results outside:</p> <pre><code>2^17 items, repeated 200 times ****************************************** Average for test #1: 0.000533 seconds +/- 0.000017 Average for test #2: 0.000527 seconds +/- 0.000016 Average for test #3: 0.000407 seconds +/- 0.000008 Average for test #4: 0.000374 seconds +/- 0.000008 Average for test #5: 0.000424 seconds +/- 0.000009 2^17 items, repeated 200 times ****************************************** Average for test #1: 0.000547 seconds +/- 0.000016 Average for test #2: 0.000732 seconds +/- 0.000020 Average for test #3: 0.000423 seconds +/- 0.000009 Average for test #4: 0.000360 seconds +/- 0.000008 Average for test #5: 0.000406 seconds +/- 0.000008 2^18 items, repeated 200 times ****************************************** Average for test #1: 0.001295 seconds +/- 0.000036 Average for test #2: 0.001283 seconds +/- 0.000020 Average for test #3: 0.001085 seconds +/- 0.000027 Average for test #4: 0.001035 seconds +/- 0.000025 Average for test #5: 0.001130 seconds +/- 0.000025 2^18 items, repeated 200 times ****************************************** Average for test #1: 0.001234 seconds +/- 0.000026 Average for test #2: 0.001319 seconds +/- 0.000023 Average for test #3: 0.001309 seconds +/- 0.000025 Average for test #4: 0.001191 seconds +/- 0.000026 Average for test #5: 0.001196 seconds +/- 0.000022 Test#1 = My Original Code Test#2 = Optimized safe loop unrolling Test#3 = Unsafe code - loop unrolling Test#4 = Unsafe code Test#5 = My Optimized Code </code></pre> <p>Looks like loop unrolling is not favorable. My optimized code is still my best way to go and with only 10% difference compared to the unsafe code. If only I could tell the compiler that (i &lt; buf.Length) implies that (offset &lt; inout.Length), it will drop the check (inout[offset]) and I will basically get the unsafe performance.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload