StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
8027654
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2011-11-06T14:25:25.207
FavoriteCount
0
LastActivityDate
2011-11-06T14:32:11.430
LastEditDate
2011-11-06T14:32:11.430
LastEditorUserId
202699
OwnerUserId
202699
ParentId
8023135
PostTypeId
2
Score
17
ViewCount
0
LastEditorDisplayName
text
Body
I didn't actually run your code, but I see an immediate mistake on <code>p</code>, which should be <code>private</code> not <code>shared</code>. The parallel invocation of <code>qs</code>: <code>qs(v, p.first, p.second);</code> will have races on <code>p</code>, resulting in unpredictable behavior. The local variables at <code>qs</code> should be okay because all threads have their own stack. However, the overall approach is good. You're on the right track. <hr> Here are my general comments for the implementation of parallel quicksort. Quicksort itself is embarrassingly parallel, which means no synchronization is needed. The recursive calls of <code>qs</code> on a partitioned array is embarrassingly parallel. However, the parallelism is exposed in a recursive form. If you simply use the nested parallelism in OpenMP, you will end up having thousand threads in a second. No speedup will be gained. So, mostly you need to turn the recursive algorithm into an interative one. Then, you need to implement a sort of work-queue. This is your approach. And, it's not easy. For your approach, there is a good benchmark: OmpSCR. You can download at <a href="http://sourceforge.net/projects/ompscr/" rel="noreferrer">http://sourceforge.net/projects/ompscr/</a> In the benchmark, there are several versions of OpenMP-based quicksort. Most of them are similar to yours. However, to increase parallelism, one must minimize the contention on a global queue (in your code, it's <code>s</code>). So, there could be a couple of optimizations such as having local queues. Although the algorithm itself is purely parallel, the implementation may require synchronization artifacts. And, most of all, it's very hard to gain speedups. <hr> However, you still directly use recursive parallelism in OpenMP in two ways: (1) Throttling the total number of the threads, and (2) Using OpenMP 3.0's <code>task</code>. Here is pseudo code for the first approach (This is only based on OmpSCR's benchmark): <pre><code>void qsort_omp_recursive(int* begin, int* end) { if (begin != end) { // Partition ... // Throttling if (...) { qsort_omp_recursive(begin, middle); qsort_omp_recursive(++middle, ++end); } else { #pragma omp parallel sections nowait { #pragma omp section qsort_omp_recursive(begin, middle); #pragma omp section qsort_omp_recursive(++middle, ++end); } } } } </code></pre> In order to run this code, you need to call <code>omp_set_nested(1)</code> and <code>omp_set_num_threads(2)</code>. The code is really simple. We simply spawn two threads on the division of the work. However, we insert a simple throttling logic to prevent excessive threads. Note that my experimentation showed decent speedups for this approach. <hr> Finally, you may use OpenMP 3.0's <code>task</code>, where a task is a logically concurrent work. In the above all OpenMP's approaches, each parallel construct spawns two physical threads. You may say there is a hard 1-to-1 mapping between a task to a work thread. However, <code>task</code> separates logical tasks and workers. Because OpenMP 3.0 is not popular yet, I will use Cilk Plus, which is great to express this kind of nested and recursive parallelisms. In Cilk Plus, the parallelization is extremely easy: <pre><code>void qsort(int* begin, int* end) { if (begin != end) { --end; int* middle = std::partition(begin, end, std::bind2nd(std::less<int>(), *end)); std::swap(*end, *middle); cilk_spawn qsort(begin, middle); qsort(++middle, ++end); // cilk_sync; Only necessay at the final stage. } } </code></pre> I copied this code from Cilk Plus' example code. You will see a single keyword <code>cilk_spawn</code> is everything to parallelize quicksort. I'm skipping the explanations of Cilk Plus and spawn keyword. However, it's easy to understand: the two recursive calls are declared as logically concurrent tasks. Whenever the recursion takes place, the logical tasks are created. But, the Cilk Plus runtime (which implements an efficient work-stealing scheduler) will handle all kinds of dirty job. It optimally queues the parallel tasks and maps to the work threads. Note that OpenMP 3.0's <code>task</code> is essentially similar to the Cilk Plus's approach. My experimentation shows that pretty nice speedups were feasible. I got a 3~4x speedup on a 8-core machine. And, the speedup was scale. Cilk Plus' absolute speedups are greater than those of OpenMP 3.0's. The approach of Cilk Plus (and OpenMP 3.0) and your approach are essentially the same: the separation of parallel task and workload assignment. However, it's very difficult to implement efficiently. For example, you must reduce the contention and use lock-free data structures.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POC OpenMP parallel quickSort
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USminjang
UserOwnerUserId
1. USminjang
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POC OpenMP parallel quickSort
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.