StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POExpression templates and C++11
text
Body
copied!<p>Let's look at one particular benefit of expression templates: ETs can be used to avoid vector-sized temporaries in memory which occur in overloaded operators like:</p> <pre><code>template<typename T> std::vector<T> operator+(const std::vector<T>& a, const std::vector<T>& b) { std::vector<T> tmp; // vector-sized temporary for_each(...); return tmp; } </code></pre> <p>In C++11 the return statement of this function applies move semantics. No copy of the vector. That's a win.</p> <p>However, if I look at a simple expression like</p> <pre><code>d = a + b + c; </code></pre> <p>I see that the above function gets called twice (for both <code>operator+</code>) while the final assignment can be done with move semantics.</p> <p>In total 2 loops are executed. Means that I put out a temporary and read it back in right after. For big vectors this falls out of cache. That's worse than expression templates. They can do the whole thing in just 1 loop. ETs can execute the above code equivalent to:</p> <pre><code>for(int i=0 ; i < vec_length ; ++i) d[i] = a[i] + b[i] + c[i]; </code></pre> <p>I was wondering whether lambdas together with move semantics or any other new feature can do as good as ETs. Any thoughts?</p> <p><strong>Edit:</strong></p> <p>Basically, using the ET technique the compiler builds a parse tree that resembles the algebraic expression with it's type system. This tree consists of inner nodes and leaf nodes. The inner nodes represent operations (addition, multiplication, etc.) and the leaf nodes represent references to the data objects.</p> <p>I tried to think of the whole computation process in the fashion of a stack machine: Take an operation from an operation stack and pull the next arguments from the argument stack and evaluate the operation. Put the result back on the stack waiting for the operation.</p> <p>To represent these two different objects (operation stack and data leaf stack) I bundled together a <code>std::tuple</code> for the operations and a <code>std::tuple</code> for the data leaves into a <code>std::pair<></code>. Initially I used a <code>std:vector</code> but that resulted in runtime overhead.</p> <p>The whole process goes in two phases: Stack machine initialisation where the operation and argument stack are initialised. And the evaluation phase which is triggered by assigning the paired containers to the vector.</p> <p>I created a class <code>Vec</code> which holds a private <code>array<int,5></code> (the payload) and which features an overloaded assignment operator that takes the "expression".</p> <p>The global <code>operator*</code> is overloaded for all combinations of taking <code>Vec</code> and "expression" to enable the correct handling also in the case where we have more than just <code>a*b</code>. (Notice, I switched for this educational example to the multiplication - basically to quickly spot the <code>imull</code> in the assembler.)</p> <p>What is done first before the evaluation starts is "extracting" the values out of the involved <code>Vec</code> objects and initializing the argument stack. That was necessary to not have different kinds of objects lying around: Indexable vectors and non-indexable results. This is what the <code>Extractor</code> is for. Nice thing again: Variadic templates are used which in this case results in no run-time overhead (all this is done at compile time).</p> <p>The whole thing works. The expression is nicely evaluated (I also added the addition, but that is left out here to fit the code). Below you can see the assembler output. Just raw compuation, exactly as you want it to be: En-par with ET technique.</p> <p>Upshot. The new language features of C++11 offer the variadic templates which (along with template meta-programming) open up the area of compile time computation. I showed here how the benefits of variadic templates can be used to produce code as good as with the traditional ET technique.</p> <pre><code>#include<algorithm> #include<iostream> #include<vector> #include<tuple> #include<utility> #include<array> template<typename Target,typename Tuple, int N, bool end> struct Extractor { template < typename ... Args > static Target index(int i,const Tuple& t, Args && ... args) { return Extractor<Target, Tuple, N+1, std::tuple_size<Tuple>::value == N+1>:: index(i, t , std::forward<Args>(args)..., std::get<N>(t).vec[i] ); } }; template < typename Target, typename Tuple, int N > struct Extractor<Target,Tuple,N,true> { template < typename ... Args > static Target index(int i,Tuple const& t, Args && ... args) { return Target(std::forward<Args>(args)...); } }; template < typename ... Vs > std::tuple<typename std::remove_reference<Vs>::type::type_t...> extract(int i , const std::tuple<Vs...>& tpl) { return Extractor<std::tuple<typename std::remove_reference<Vs>::type::type_t...>, std::tuple<Vs...>, 0, std::tuple_size<std::tuple<Vs...> >::value == 0>::index(i,tpl); } struct Vec { std::array<int,5> vec; typedef int type_t; template<typename... OPs,typename... VALs> Vec& operator=(const std::pair< std::tuple<VALs...> , std::tuple<OPs...> >& e) { for( int i = 0 ; i < vec.size() ; ++i ) { vec[i] = eval( extract(i,e.first) , e.second ); } } }; template<int OpPos,int ValPos, bool end> struct StackMachine { template<typename... OPs,typename... VALs> static void eval_pos( std::tuple<VALs...>& vals , const std::tuple<OPs...> & ops ) { std::get<ValPos+1>( vals ) = std::get<OpPos>(ops).apply( std::get<ValPos>( vals ) , std::get<ValPos+1>( vals ) ); StackMachine<OpPos+1,ValPos+1,sizeof...(OPs) == OpPos+1>::eval_pos(vals,ops); } }; template<int OpPos,int ValPos> struct StackMachine<OpPos,ValPos,true> { template<typename... OPs,typename... VALs> static void eval_pos( std::tuple<VALs...>& vals , const std::tuple<OPs...> & ops ) {} }; template<typename... OPs,typename... VALs> int eval( const std::tuple<VALs...>& vals , const std::tuple<OPs...> & ops ) { StackMachine<0,0,false>::eval_pos(const_cast<std::tuple<VALs...>&>(vals),ops); return std::get<sizeof...(OPs)>(vals); } struct OpMul { static int apply(const int& lhs,const int& rhs) { return lhs*rhs; } }; std::pair< std::tuple< const Vec&, const Vec& > , std::tuple<OpMul> > operator*(const Vec& lhs,const Vec& rhs) { return std::make_pair( std::tuple< const Vec&, const Vec& >( lhs , rhs ) , std::tuple<OpMul>( OpMul() ) ); } template<typename... OPs,typename... VALs> std::pair< std::tuple< const Vec&, VALs... > , std::tuple<OPs...,OpMul> > operator*(const Vec& lhs,const std::pair< std::tuple< VALs... > , std::tuple<OPs...> >& rhs) { return std::make_pair( std::tuple_cat( rhs.first , std::tuple< const Vec& >(lhs) ) , std::tuple_cat( rhs.second , std::tuple<OpMul>( OpMul() ) ) ); } template<typename... OPs,typename... VALs> std::pair< std::tuple< const Vec&, VALs... > , std::tuple<OPs...,OpMul> > operator*(const std::pair< std::tuple< VALs... > , std::tuple<OPs...> >& lhs, const Vec& rhs) { return std::make_pair( std::tuple_cat( lhs.first , std::tuple< const Vec& >(rhs) ) , std::tuple_cat( lhs.second , std::tuple<OpMul>( OpMul() ) ) ); } int main() { Vec d,c,b,a; for( int i = 0 ; i < d.vec.size() ; ++i ) { a.vec[i] = 10+i; b.vec[i] = 20+i; c.vec[i] = 30+i; d.vec[i] = 0; } d = a * b * c * a; for( int i = 0 ; i < d.vec.size() ; ++i ) std::cout << d.vec[i] << std::endl; } </code></pre> <p>Assembler generated with <code>g++-4.6 -O3</code> (I had to put some runtime dependence into the vector initialization so that the compiler doesn't calculate the whole thing at compile time and you actually see the <code>imull</code> instaructions.)</p> <pre><code>imull %esi, %edx imull 32(%rsp), %edx imull %edx, %esi movl 68(%rsp), %edx imull %ecx, %edx movl %esi, (%rsp) imull 36(%rsp), %edx imull %ecx, %edx movl 104(%rsp), %ecx movl %edx, 4(%rsp) movl 72(%rsp), %edx imull %ecx, %edx imull 40(%rsp), %edx imull %ecx, %edx movl 108(%rsp), %ecx movl %edx, 8(%rsp) movl 76(%rsp), %edx imull %ecx, %edx imull 44(%rsp), %edx imull %ecx, %edx movl 112(%rsp), %ecx movl %edx, 12(%rsp) movl 80(%rsp), %edx imull %ecx, %edx imull %eax, %edx imull %ecx, %edx movl %edx, 16(%rsp) </code></pre>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload