StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POParallelizing nested loops with dependencies
primarykey
Id
1512396
data
AcceptedAnswerId
1512503
AnswerCount
3
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2009-10-03T00:25:01.053
FavoriteCount
1
LastActivityDate
2009-10-03T02:14:53.613
LastEditDate
2009-10-03T02:14:53.613
LastEditorUserId
171187
OwnerUserId
171187
ParentId
0
PostTypeId
1
Score
1
ViewCount
1273
LastEditorDisplayName
text
Body
My question is about how best to structure my (C++) code to support parallelizing a time consuming computation. The pseudocode in question has the following structure: <pre><code>for a_1(y_1) in A_1 for a_2(y_2) in A_2(a_1) ... for a_n(y_n) in A_n(a_1, ..., a_{n-1}) y_n = f(a_1, ..., a_n) y_{n-1} = g_n(Y_n) ... y_1 = g_2(Y_2) . </code></pre> Roughly speaking, each loop iterates over elements in a set <code>A_i</code>, the successive elements of which are dependent on feedback <code>y_i</code> from previous iterations. In other words, to determine the next <code>a_i</code>, we must have finished all computations on the current <code>a_i</code>. Furthermore, interior sets depend on the outer iterations. Written in a recursive form: <pre><code>Iterate(A_i, a_1, ..., a_{i-1}): for a_i(h_i) in A_i Y_i += Iterate(A_{i+1}, a_1, ..., a_i) return g(Y_i) Iterate(any, a_1, ..., a_n): return f(a_1, ..., a_n) Iterate(A_1) </code></pre> Assume that f(...) is a time-consuming computation, and that the feedback functions g(...) are simple (fast). Now, if all the sets <code>A_i</code> are "large", then the problem is embarrassingly parallelizable. Currently, I have a thread pool and just toss the computations of the inner-most loop into the pool. The problem is, very often the inner-most loop is an iteration over a singleton, so the thread pool only ever has one running thread in it. I have thought about using futures to return the values to outer loops, but that would require futures-of-futures, etc. and it gets pretty messy (I think). I realize that the structure I have listed above is pretty complicated, so there are a number of simplifying cases I am also interested in: <ol> <li><code>a_i(h_i) = a_i</code> ; independent of h_i</li> <li><code>A_i(a_1, ..., a_{i-1}) = A_i</code> ; independent of a_1, ... a_{i-1}</li> <li><code>g_i = 0</code> ; independent of H_{i+1}</li> <li>All outer loops are "large"; the number of elements in those sets is much greater than the number of cores.</li> </ol> Now, in practice, n <= 3, and item 1 holds for all outer loops, and items 2-4 all hold, so particular solutions for that case are sufficient. But since I am bothering to ask the question here, I am interested in getting ideas for how to deal with the additional complexity for more general problems if possible. <hr> Edit: Cleaned up the first pseudocode block to make it consistent with the other. Since people cannot understand my mathematical notation, here is a more concrete simple example: <pre><code>#include <cmath> #include <iostream> #include <vector> using namespace std; double f(double a1, int a2, double a3){ // Very slow function cout << a1 << ", " << a2 << ", " << a3 << endl; return pow(a1*a3, a2) + a1 + a2 + a3; // just some contrived example } int g2(const vector<double> &Y3){ // average-ish double sum = 0; for(int i = 0; i < Y3.size(); ++i){ sum += Y3[i]; } return int(sum / (Y3.size() + 1)); } double g1(const vector<int> &Y2){ // return 1/(min(Y2)+1.0) int imin = 0; int minval = 0; for(int i = 1; i < Y2.size(); ++i){ if(Y2[i] < minval){ imin = i; minval = Y2[imin]; } } return 1.0/double(minval+1.0); } int main(){ for(double a1 = 0.0, h1 = 10.0; a1 < 1.0; a1 += h1){ // for a1 in A1 vector<int> Y2; for(int a2 = 2, h2 = 1; a2 <= (int)(5*a1+10); a2 += h2){ // for a2 in A2(a1) vector<double> Y3; for(double a3 = 6.0, h3 = 1.0; a3 >= (a1+a2); a3 -= 0.5*h3){ // for a3 in A2(a1, a2) h3 = f(a1, a2, a3); Y3.push_back(h3); } h2 = g2(Y3); Y2.push_back(h2); } h1 = g1(Y2); } return 0; } </code></pre> I picked the values randomly, and it turns out <code>f</code> is only evaluated 3 times. Note that the above code is NOT parallelizable. Assume that it is possible to query if a loop's incrementation depends on higher up loops. I should clarify what I'm after as well. When I originally said structure, I perhaps should have said parallelization methodology or something like that. For example, my first attempt to parallelize was to throw the inner-most calls to <code>f</code> into a thread pool and join at the end of the inner-most loop. As mentioned above, this doesn't work when the inner-most loop iterates only over one element. This didn't really require restructuring the existing code significantly, and I would like to avoid it if possible.
Tags
<multithreading><parallel-processing>
Title
Parallelizing nested loops with dependencies
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USVictor Liu
UserOwnerUserId
1. USVictor Liu
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
2. PO
 singulars
 PostTypePostTypeId
 PTAnswer
3. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POParallelizing nested loops with dependencies
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POParallelizing nested loops with dependencies
 UserUserId
 USBobbyShaftoe
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.