StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
16818938
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2013-05-29T16:09:44.710
FavoriteCount
0
LastActivityDate
2013-05-29T17:51:58.797
LastEditDate
2017-05-23T11:44:29.997
LastEditorUserId
-1
OwnerUserId
912144
ParentId
16807766
PostTypeId
2
Score
5
ViewCount
0
LastEditorDisplayName
text
Body
I think <a href="https://stackoverflow.com/a/16667228/912144">this answer</a> describes the reason sufficiently, but I'll expand a bit here. Before, however, here's <a href="http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/C-Dialect-Options.html#C-Dialect-Options" rel="nofollow noreferrer">gcc 4.8's documentation on <code>-fopenmp</code></a>: <blockquote> <code>-fopenmp</code> Enable handling of OpenMP directives #pragma omp in C/C++ and !$omp in Fortran. When -fopenmp is specified, the compiler generates parallel code according to the OpenMP Application Program Interface v3.0 <a href="http://www.openmp.org/" rel="nofollow noreferrer">http://www.openmp.org/</a>. This option implies -pthread, and thus is only supported on targets that have support for -pthread. </blockquote> Note that it doesn't specify disabling of any features. Indeed, there is no reason for gcc to disable any optimization. The reason however why openmp with 1 thread has overhead with respect to no openmp is the fact that the compiler needs to convert the code, adding functions so it would be ready for cases with openmp with n>1 threads. So let's think of a simple example: <pre><code>int *b = ... int *c = ... int a = 0; #omp parallel for reduction(+:a) for (i = 0; i < 100; ++i) a += b[i] + c[i]; </code></pre> This code should be converted to something like this: <pre><code>struct __omp_func1_data { int start; int end; int *b; int *c; int a; }; void *__omp_func1(void *data) { struct __omp_func1_data *d = data; int i; d->a = 0; for (i = d->start; i < d->end; ++i) d->a += d->b[i] + d->c[i]; return NULL; } ... for (t = 1; t < nthreads; ++t) /* create_thread with __omp_func1 function */ /* for master thread, don't create a thread */ struct master_data md = { .start = /*...*/, .end = /*...*/ .b = b, .c = c }; __omp_func1(&md); a += md.a; for (t = 1; t < nthreads; ++t) { /* join with thread */ /* add thread_data->a to a */ } </code></pre> Now if we run this with <code>nthreads==1</code>, the code effectively gets reduced to: <pre><code>struct __omp_func1_data { int start; int end; int *b; int *c; int a; }; void *__omp_func1(void *data) { struct __omp_func1_data *d = data; int i; d->a = 0; for (i = d->start; i < d->end; ++i) d->a += d->b[i] + d->c[i]; return NULL; } ... struct master_data md = { .start = 0, .end = 100 .b = b, .c = c }; __omp_func1(&md); a += md.a; </code></pre> So what are the differences between the no openmp version and the single threaded openmp version? One difference is that there is extra glue code. The variables that need to be passed to the function created by openmp need to be put together to form one argument. So there is some overhead preparing for the function call (and later retrieving data) More importantly however, is that now the code is not in one piece any more. Cross-function optimization is not so advanced yet and most optimizations are done within each function. Smaller functions means there is smaller possibility to optimize. <hr> To finish this answer, I'd like to show you exactly how <code>-fopenmp</code> affects <code>gcc</code>'s options. (Note: I'm on an old computer now, so I have gcc 4.4.3) Running <code>gcc -Q -v some_file.c</code> gives this (relevant) output: <pre><code>GGC heuristics: --param ggc-min-expand=98 --param ggc-min-heapsize=128106 options passed: -v a.c -D_FORTIFY_SOURCE=2 -mtune=generic -march=i486 -fstack-protector options enabled: -falign-loops -fargument-alias -fauto-inc-dec -fbranch-count-reg -fcommon -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types -ffunction-cse -fgcse-lm -fident -finline-functions-called-once -fira-share-save-slots -fira-share-spill-slots -fivopts -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return -fpeephole -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros -fsplit-ivs-in-unroller -fstack-protector -ftrapping-math -ftree-cselim -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops= -ftree-reassoc -ftree-scev-cprop -ftree-switch-conversion -ftree-vect-loop-version -funit-at-a-time -fvar-tracking -fvect-cost-model -fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double -maccumulate-outgoing-args -malign-stringops -mfancy-math-387 -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mno-red-zone -mno-sse4 -mpush-args -msahf -mtls-direct-seg-refs </code></pre> and running <code>gcc -Q -v -fopenmp some_file.c</code> gives this (relevant) output: <pre><code>GGC heuristics: --param ggc-min-expand=98 --param ggc-min-heapsize=128106 options passed: -v -D_REENTRANT a.c -D_FORTIFY_SOURCE=2 -mtune=generic -march=i486 -fopenmp -fstack-protector options enabled: -falign-loops -fargument-alias -fauto-inc-dec -fbranch-count-reg -fcommon -fdwarf2-cfi-asm -fearly-inlining -feliminate-unused-debug-types -ffunction-cse -fgcse-lm -fident -finline-functions-called-once -fira-share-save-slots -fira-share-spill-slots -fivopts -fkeep-static-consts -fleading-underscore -fmath-errno -fmerge-debug-strings -fmove-loop-invariants -fpcc-struct-return -fpeephole -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros -fsplit-ivs-in-unroller -fstack-protector -ftrapping-math -ftree-cselim -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-parallelize-loops= -ftree-reassoc -ftree-scev-cprop -ftree-switch-conversion -ftree-vect-loop-version -funit-at-a-time -fvar-tracking -fvect-cost-model -fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double -maccumulate-outgoing-args -malign-stringops -mfancy-math-387 -mfp-ret-in-387 -mfused-madd -mglibc -mieee-fp -mno-red-zone -mno-sse4 -mpush-args -msahf -mtls-direct-seg-refs </code></pre> Taking a diff, we can see that the only difference is that with <code>-fopenmp</code>, we have <code>-D_REENTRANT</code> defined (and of course <code>-fopenmp</code> enabled). So, rest assured, gcc wouldn't produce worse code. It's just that it needs to add preparation code for when number of threads is greater than 1 and that has some overhead. <hr> Update: I really should have tested this with optimization enabled. Anyway, with gcc 4.7.3, the output of the same commands, added <code>-O3</code> will give the same difference. So, even with <code>-O3</code>, there are no optimization's disabled.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POMay compiler optimizations be inhibited by multi-threading?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USShahbaz
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POMay compiler optimizations be inhibited by multi-threading?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.