Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>You gain on speed but lose some on readability and maintenance. Instead of a if-then-else tree, if a then fun_a(), else if b then fun_b(), else if c then fun_c() else fun_default(), and having to do that every time, instead if a then fun=fun_a, else if b then fun=fun_b, etc and you do that one time, from then on just call fun(). Much faster. As pointed out you cannot inline, which is another speed trick but inlining on the if-then-else tree doesnt necessarily make it faster than without inlining and is generall not as fast as the function pointer. </p> <p>You lose a little readability and maintenance because you have to figure out where fun() is set, how often it changes if ever, insure you dont call it before it is setup, but it is still a single searchable name you can use to find and maintain all the places it is used.</p> <p>It is basically a speed trick to avoid if-then-else trees every time you want to perform a function. If performance is not critical, if nothing else fun() could be static and have the if-then-else tree in it. </p> <p>EDIT Adding some examples to explain what I was talking about.</p> <pre> extern unsigned int fun1 ( unsigned int a, unsigned int b ); unsigned int (*funptr)(unsigned int, unsigned int); void have_fun ( unsigned int x, unsigned int y, unsigned int z ) { unsigned int j; funptr=fun1; j=fun1(z,5); j=funptr(y,6); } </pre> <p>Compiling gives this:</p> <pre> have_fun: stmfd sp!, {r3, r4, r5, lr} .save {r3, r4, r5, lr} ldr r4, .L2 mov r5, r1 mov r0, r2 mov r1, #5 ldr r2, .L2+4 str r2, [r4, #0] bl fun1 ldr r3, [r4, #0] mov r0, r5 mov r1, #6 blx r3 ldmfd sp!, {r3, r4, r5, pc} </pre> <p>What I assume Clifford was talking about is that a direct call, if near enough (depending on the architecture), is one instruction</p> <pre> bl fun1 </pre> <p>Where a function pointer, is going to cost you at least two</p> <pre> ldr r3, [r4, #0] blx r3 </pre> <p>I had also mentioned the difference between direct and indirect was the extra load you incur.</p> <p>Before moving on it is worth mentioning the pros and cons of inlining. In the case of ARM which is what these examples are using, the calling convention uses r0-r3 for incoming parameters to a function and r0 to return. So entry into have_fun() with three parameters means r0-r3 have content. With ARM it is also assumed that a function can destroy r0-r3, so have_fun() needs to preserve the inputs and then place the two inputs to fun1() in r0 and r1, so a bit of a register dance happens.</p> <pre> mov r5, r1 mov r0, r2 mov r1, #5 ldr r2, .L2+4 str r2, [r4, #0] bl fun1 </pre> <p>The compiler was smart enough to see that we never needed the first input to the have_fun() function, so r0 was discarded and allowed to be changed right away. Also the compiler was smart enough to know that we would never need the third parameter, z (r2), after sending it to fun1() on the first call, so it didnt need to save it in a high register. R1 though, the second parameter to have_fun() does need to be preserved so it is put in a regsiter that wont get destroyed by fun1().</p> <p>You can see the same kind of thing happen for the second function call.</p> <p>Assuming fun1() is this simple function:</p> <pre> inline unsigned int fun1 ( unsigned int a, unsigned int b ) { return(a+b); } </pre> <p>When you inline fun1() you get something like this:</p> <pre> stmfd sp!, {r4, lr} mov r0, r1 mov r1, #6 add r4, r2, #5 </pre> <p>The compiler does not need to shuffle the lower registers about to prepare for a call. Likewise what you may have noticed is that r4 and lr are preserved on the stack when we enter hello_fun(). With this ARM calling convention a function can destroy r0-r3 but must preserve all the other registers, since have_fun() in this case needed more than four registers to do its thing it saved the contents of r4 on the stack so that it could use it. Likewise this function as I compiled it did call another function, the bl/blx instruction uses/destroys the lr register (r14) so in order for have_fun() to return we also have to preserve lr on the stack. The simplified example for fun1() did not show this but another savings you get from inlining is that on entry the function called does not have to set up a stack frame and preserve registers, it really is as if you took the code from the function and shoved it inline with the calling function.</p> <p>Why wouldnt you inline all the time? Well first it can and will use more registers and that can lead to more stack use, and stack is slow relative to registers. Most important though is that it increases the size of your binary, if fun1() was a good sized function and you called it 20 times in have_fun() your binary would be considerably larger. For modern computers with gigabytes of ram, a few hundred or few dozen thousand bytes is no big deal, but for embedded with limited resources this can make or break you. On a modern gigahertz multicore desktop, how often do you need to shave an instruction or five anyway? Sometimes yes but not all the time for every function. So just because you probably can get away with it on a desktop you probably should not.</p> <p>Back to function pointers. So the point I was trying to make with my answer is, what situations would you likely want to use a function pointer anyway, what are the use cases and in those use cases how much does it help or hurt?</p> <p>The kinds of cases I was thinking of are plugins, or code specific to a calling parameter or generic code reacting to specific hardware detected. For example, a hypothetical tar program may want to output to a tape drive, file system, or other and you may choose to write the code with generic functions called using function pointers. Upon entry to the program the command line parameters indicate the output and at that point you set the function pointers to the device specific functions.</p> <pre> if(outdev==OUTDEV_TAPE) data_out=data_out_tape; else if(outdev==OUTDEV_FILE) { //open the file, etc data_out=data_out_file; } ... </pre> <p>Or perhaps you dont know if you are running on a processor with an fpu or which fpu type you have but you know that a floating point divide you want to do can run much faster using the fpu:</p> <pre> if(fputype==FPU_FPA) fdivide=fdivide_fpa; else if(fputype==FPU_VFP) fdivide=fdivide_vfp; else fdivide=fdivide_soft; </pre> <p>And absolutely you can use a case statement instead of an if-then-else tree, pros and cons to each, some compilers turn a case statement int an if-then-else tree anyway, so it doesnt always matter. The point I was trying to make is if you do this one time:</p> <pre> if(fputype==FPU_FPA) fdivide=fdivide_fpa; else if(fputype==FPU_VFP) fdivide=fdivide_vfp; else fdivide=fdivide_soft; </pre> <p>And do this every where else in the program:</p> <pre> a=fdivide(b,c); </pre> <p>Compared to a non-function-pointer alternative where you do this every where you want to divide:</p> <pre> if(fputype==FPU_FPA) a=fdivide_fpa(b,c); else if(fputype==FPU_VFP) a=fdivide_vfp(b,c); else a=fdivide_soft(b,c); </pre> <p>The function pointer approach, even though it costs you an extra ldr on each call, is a lot cheaper than the many instructions required for the if-then-else tree. You pay a little up front to setup the fdivide pointer one time then pay an extra ldr on each instance, but overall it is faster than this:</p> <pre> unsigned int fun1 ( unsigned int a, unsigned int b ); unsigned int fun2 ( unsigned int a, unsigned int b ); unsigned int fun3 ( unsigned int a, unsigned int b ); unsigned int (*funptr)(unsigned int, unsigned int); unsigned int have_fun ( unsigned int x, unsigned int y, unsigned int z ) { unsigned int j; switch(x) { default: case 1: j=fun1(y,z); break; case 2: j=fun2(y,z); break; case 3: j=fun3(y,z); break; } return(j); } unsigned int more_fun ( unsigned int x, unsigned int y, unsigned int z ) { unsigned int j; j=funptr(y,z); return(j); } </pre> <p>gives us this:</p> <pre> cmp r0, #2 beq .L3 cmp r0, #3 beq .L4 mov r0, r1 mov r1, r2 b fun1 .L3: mov r0, r1 mov r1, r2 b fun2 .L4: mov r0, r1 mov r1, r2 b fun3 </pre> <p>instead of this</p> <pre> mov r0, r1 ldr r3, .L7 mov r1, r2 blx r3 </pre> <p>For the default case the if-then-else tree burns two compares and two beq's before calling the function directly. Basically sometimes the if-then-else tree will be faster and sometimes the function pointer is faster.</p> <p>Another comment I made is what if you used inlining to make that if-then-else tree faster, instead of a function pointer, inlining is always faster right?</p> <pre> unsigned int fun1 ( unsigned int a, unsigned int b ) { return(a+b); } unsigned int fun2 ( unsigned int a, unsigned int b ) { return(a-b); } unsigned int fun3 ( unsigned int a, unsigned int b ) { return(a&b); } unsigned int have_fun ( unsigned int x, unsigned int y, unsigned int z ) { unsigned int j; switch(x) { default: case 1: j=fun1(y,z); break; case 2: j=fun2(y,z); break; case 3: j=fun3(y,z); break; } return(j); } </pre> <p>gives</p> <pre> have_fun: cmp r0, #2 rsbeq r0, r2, r1 bxeq lr cmp r0, #3 addne r0, r2, r1 andeq r0, r2, r1 bx lr </pre> <p>LOL, ARM got me on that one. That is nice. You can imagine though for a generic processor you would get something like</p> <pre> cmp r0, #2 beq .L3 cmp r0, #3 beq .L4 and r0,r1,r2 bx lr .L3: sub r0,r1,r2 bx lr .L4: add r0,r1,r2 bx lr </pre> <p>You still burn the compares, the more cases you have the longer the if-then-else tree. It doesnt take much for the average case to take longer than a function pointer solution.</p> <pre> mov r0, r1 ldr r1, .L7 ldr r3,[r1] mov r1, r2 blx r3 </pre> <p>Then I also mentioned readability and maintenance, using the function pointer approach you need to always be aware of whether or not the function pointer has been assigned before using it. You cannot always just grep for that function name and find what you are looking for in someone elses code, ideally you find one place where that pointer is assigned, then you can grep for the real function names.</p> <p>Yes there are many other use cases for function pointers, and the ones I have described can be solved in many other ways, efficient or not. I was trying to give the poster some ideas on how to think through different scenarios.</p> <p>I think the most important answer to this interview question is not that there is a right or wrong answer, because I think there is not. But to see what the interviewee knows about what compilers do or dont do, the kinds of things I described above. The interview question to me is a few questions, do you understand what the compiler actually does, what instructions it generates. Do you understand that fewer or more instructions is not necessarily faster. do you understand these differences across different processors, or do you at least have a working knowledge for at least one processor. Then it goes on to readability and maintenance. That is another stream of questions that has to do with your experience in reading other peoples code, and then maintaining your own code or other peoples code. It is a cleverly designed question in my opinion.</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload