Note that there are some explanatory texts on larger screens.

plurals
  1. POUsing Multiply Accumulate Instruction Inline Assembly in C++
    text
    copied!<p>I am implementing a FIR filter on an ARM9 processor and am trying to use the SMLAL instruction. </p> <p>Initially I had the following filter implemented and it worked perfectly, except this method uses too much processing power to be used in our application. </p> <pre><code>uint32_t DDPDataAcq::filterSample_8k(uint32_t sample) { // This routine is based on the fir_double_z routine outline by Grant R Griffin // - www.dspguru.com/sw/opendsp/alglib.htm int i = 0; int64_t accum = 0; const int32_t *p_h = hCoeff_8K; const int32_t *p_z = zOut_8K + filterState_8K; /* Cast the sample to a signed 32 bit int * We need to preserve the signdness of the number, so if the 24 bit * sample is negative we need to move the sign bit up to the MSB and pad the number * with 1's to preserve 2's compliment. */ int32_t s = sample; if (s &amp; 0x800000) s |= ~0xffffff; // store input sample at the beginning of the delay line as well as ntaps more zOut_8K[filterState_8K] = zOut_8K[filterState_8K+NTAPS_8K] = s; for (i =0; i&lt;NTAPS_8K; ++i) { accum += (int64_t)(*p_h++) * (int64_t)(*p_z++); } //convert the 64 bit accumulator back down to 32 bits int32_t a = (int32_t)(accum &gt;&gt; 9); // decrement state, wrapping if below zero if ( --filterState_8K &lt; 0 ) filterState_8K += NTAPS_8K; return a; } </code></pre> <p>I have been attempting to replace the multiply accumulate with inline assembly since GCC is not using a MAC instruction even with optimization turned on. I replaced the for loop with the following: </p> <pre><code>uint32_t accum_low = 0; int32_t accum_high = 0; for (i =0; i&lt;NTAPS_4K; ++i) { __asm__ __volatile__("smlal %0,%1,%2,%3;" :"+r"(accum_low),"+r"(accum_high) :"r"(*p_h++),"r"(*p_z++)); } accum = (int64_t)accum_high &lt;&lt; 32 | (accum_low); </code></pre> <p>The output I now get using the SMLAL instruction is not the filtered data I was expecting. I have been getting random values that seem to have no pattern or connection to the original signal or the data I am expecting. </p> <p>I have a feeling I am doing something wrong with splitting the 64 bit accumulator into the high and low registers for the instruction, or I am putting them back together wrong. Either way I not sure why I am not able to get the correct output by swapping the C code with the inline assembly. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload