Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>For many reasons.</p> <p>First, Fortran compilers are highly optimized, and the language allows them to be as such. C and C++ are very loose in terms of array handling (e.g. the case of pointers referring to the same memory area). This means that the compiler cannot know in advance what to do, and is forced to create generic code. In Fortran, your cases are more streamlined, and the compiler has better control of what happens, allowing him to optimize more (e.g. using registers).</p> <p>Another thing is that Fortran store stuff columnwise, while C stores data row-wise. I havent' checked your code, but be careful of how you perform the product. In C you must scan row wise: this way you scan your array along contiguous memory, reducing the cache misses. Cache miss is the first source of inefficiency.</p> <p>Third, it depends of the blas implementation you are using. Some implementations might be written in assembler, and optimized for the specific processor you are using. The netlib version is written in fortran 77.</p> <p>Also, you are doing a lot of operations, most of them repeated and redundant. All those multiplications to obtain the index are detrimental for the performance. I don't really know how this is done in BLAS, but there are a lot of tricks to prevent expensive operations.</p> <p>For example, you could rework your code this way</p> <pre><code>template&lt;class ValT&gt; void mmult(const ValT* A, int ADim1, int ADim2, const ValT* B, int BDim1, int BDim2, ValT* C) { if ( ADim2!=BDim1 ) throw std::runtime_error("Error sizes off"); memset((void*)C,0,sizeof(ValT)*ADim1*BDim2); int cc2,cc1,cr1, a1,a2,a3; for ( cc2=0 ; cc2&lt;BDim2 ; ++cc2 ) { a1 = cc2*ADim2; a3 = cc2*BDim1 for ( cc1=0 ; cc1&lt;ADim2 ; ++cc1 ) { a2=cc1*ADim1; ValT b = B[a3+cc1]; for ( cr1=0 ; cr1&lt;ADim1 ; ++cr1 ) { C[a1+cr1] += A[a2+cr1]*b; } } } } </code></pre> <p>Try it, I am sure you will save something.</p> <p>On you #1 question, the reason is that matrix multiplication scales as O(n^3) if you use a trivial algorithm. There are algorithms that <a href="http://en.wikipedia.org/wiki/Coppersmith%E2%80%93Winograd_algorithm" rel="nofollow noreferrer">scale much better</a>. </p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload