StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
5694432
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
3
CommunityOwnedDate
CreationDate
2011-04-17T15:36:45.410
FavoriteCount
0
LastActivityDate
2016-08-06T21:57:23.847
LastEditDate
2017-05-23T12:26:22.437
LastEditorUserId
-1
OwnerUserId
576911
ParentId
4475996
PostTypeId
2
Score
68
ViewCount
0
LastEditorDisplayName
text
Body
About a year ago I was working this area for <a href="http://libcxx.llvm.org/" rel="noreferrer">libc++</a> while implementing the unordered (hash) containers for C++11. I thought I would share my experiences here. This experience supports <a href="https://stackoverflow.com/questions/4475996/given-prime-number-n-compute-the-next-prime/4476015#4476015">marcog's accepted answer</a> for a reasonable definition of "brute force". <blockquote> That means that even a simple brute force will be fast enough in most circumstances, taking O(ln(p)*sqrt(p)) on average. </blockquote> I developed several implementations of <code>size_t next_prime(size_t n)</code> where the spec for this function is: <blockquote> Returns: The smallest prime that is greater than or equal to <code>n</code>. </blockquote> Each implementation of <code>next_prime</code> is accompanied by a helper function <code>is_prime</code>. <code>is_prime</code> should be considered a private implementation detail; not meant to be called directly by the client. Each of these implementations was of course tested for correctness, but also tested with the following performance test: <pre><code>int main() { typedef std::chrono::high_resolution_clock Clock; typedef std::chrono::duration<double, std::milli> ms; Clock::time_point t0 = Clock::now(); std::size_t n = 100000000; std::size_t e = 100000; for (std::size_t i = 0; i < e; ++i) n = next_prime(n+1); Clock::time_point t1 = Clock::now(); std::cout << e/ms(t1-t0).count() << " primes/millisecond\n"; return n; } </code></pre> I should stress that this is a performance test, and does not reflect typical usage, which would look more like: <pre><code>// Overflow checking not shown for clarity purposes n = next_prime(2*n + 1); </code></pre> All performance tests were compiled with: <pre><code>clang++ -stdlib=libc++ -O3 main.cpp </code></pre> Implementation 1 There are seven implementations. The purpose for displaying the first implementation is to demonstrate that if you fail to stop testing the candidate prime <code>x</code> for factors at <code>sqrt(x)</code> then you have failed to even achieve an implementation that could be classified as brute force. This implementation is brutally slow. <pre><code>bool is_prime(std::size_t x) { if (x < 2) return false; for (std::size_t i = 2; i < x; ++i) { if (x % i == 0) return false; } return true; } std::size_t next_prime(std::size_t x) { for (; !is_prime(x); ++x) ; return x; } </code></pre> For this implementation only I had to set <code>e</code> to 100 instead of 100000, just to get a reasonable running time: <pre><code>0.0015282 primes/millisecond </code></pre> Implementation 2 This implementation is the slowest of the brute force implementations and the only difference from implementation 1 is that it stops testing for primeness when the factor surpasses <code>sqrt(x)</code>. <pre><code>bool is_prime(std::size_t x) { if (x < 2) return false; for (std::size_t i = 2; true; ++i) { std::size_t q = x / i; if (q < i) return true; if (x % i == 0) return false; } return true; } std::size_t next_prime(std::size_t x) { for (; !is_prime(x); ++x) ; return x; } </code></pre> Note that <code>sqrt(x)</code> isn't directly computed, but inferred by <code>q < i</code>. This speeds things up by a factor of thousands: <pre><code>5.98576 primes/millisecond </code></pre> and validates marcog's prediction: <blockquote> ... this is well within the constraints of most problems taking on the order of a millisecond on most modern hardware. </blockquote> Implementation 3 One can nearly double the speed (at least on the hardware I'm using) by avoiding use of the <code>%</code> operator: <pre><code>bool is_prime(std::size_t x) { if (x < 2) return false; for (std::size_t i = 2; true; ++i) { std::size_t q = x / i; if (q < i) return true; if (x == q * i) return false; } return true; } std::size_t next_prime(std::size_t x) { for (; !is_prime(x); ++x) ; return x; } 11.0512 primes/millisecond </code></pre> Implementation 4 So far I haven't even used the common knowledge that 2 is the only even prime. This implementation incorporates that knowledge, nearly doubling the speed again: <pre><code>bool is_prime(std::size_t x) { for (std::size_t i = 3; true; i += 2) { std::size_t q = x / i; if (q < i) return true; if (x == q * i) return false; } return true; } std::size_t next_prime(std::size_t x) { if (x <= 2) return 2; if (!(x & 1)) ++x; for (; !is_prime(x); x += 2) ; return x; } 21.9846 primes/millisecond </code></pre> Implementation 4 is probably what most people have in mind when they think "brute force". Implementation 5 Using the following formula you can easily choose all numbers which are divisible by neither 2 nor 3: <pre><code>6 * k + {1, 5} </code></pre> where k >= 1. The following implementation uses this formula, but implemented with a cute xor trick: <pre><code>bool is_prime(std::size_t x) { std::size_t o = 4; for (std::size_t i = 5; true; i += o) { std::size_t q = x / i; if (q < i) return true; if (x == q * i) return false; o ^= 6; } return true; } std::size_t next_prime(std::size_t x) { switch (x) { case 0: case 1: case 2: return 2; case 3: return 3; case 4: case 5: return 5; } std::size_t k = x / 6; std::size_t i = x - 6 * k; std::size_t o = i < 2 ? 1 : 5; x = 6 * k + o; for (i = (3 + o) / 2; !is_prime(x); x += i) i ^= 6; return x; } </code></pre> This effectively means that the algorithm has to check only 1/3 of the integers for primeness instead of 1/2 of the numbers and the performance test shows the expected speed up of nearly 50%: <pre><code>32.6167 primes/millisecond </code></pre> Implementation 6 This implementation is a logical extension of implementation 5: It uses the following formula to compute all numbers which are not divisible by 2, 3 and 5: <pre><code>30 * k + {1, 7, 11, 13, 17, 19, 23, 29} </code></pre> It also unrolls the inner loop within is_prime, and creates a list of "small primes" that is useful for dealing with numbers less than 30. <pre><code>static const std::size_t small_primes[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 }; static const std::size_t indices[] = { 1, 7, 11, 13, 17, 19, 23, 29 }; bool is_prime(std::size_t x) { const size_t N = sizeof(small_primes) / sizeof(small_primes[0]); for (std::size_t i = 3; i < N; ++i) { const std::size_t p = small_primes[i]; const std::size_t q = x / p; if (q < p) return true; if (x == q * p) return false; } for (std::size_t i = 31; true;) { std::size_t q = x / i; if (q < i) return true; if (x == q * i) return false; i += 6; q = x / i; if (q < i) return true; if (x == q * i) return false; i += 4; q = x / i; if (q < i) return true; if (x == q * i) return false; i += 2; q = x / i; if (q < i) return true; if (x == q * i) return false; i += 4; q = x / i; if (q < i) return true; if (x == q * i) return false; i += 2; q = x / i; if (q < i) return true; if (x == q * i) return false; i += 4; q = x / i; if (q < i) return true; if (x == q * i) return false; i += 6; q = x / i; if (q < i) return true; if (x == q * i) return false; i += 2; } return true; } std::size_t next_prime(std::size_t n) { const size_t L = 30; const size_t N = sizeof(small_primes) / sizeof(small_primes[0]); // If n is small enough, search in small_primes if (n <= small_primes[N-1]) return *std::lower_bound(small_primes, small_primes + N, n); // Else n > largest small_primes // Start searching list of potential primes: L * k0 + indices[in] const size_t M = sizeof(indices) / sizeof(indices[0]); // Select first potential prime >= n // Known a-priori n >= L size_t k0 = n / L; size_t in = std::lower_bound(indices, indices + M, n - k0 * L) - indices; n = L * k0 + indices[in]; while (!is_prime(n)) { if (++in == M) { ++k0; in = 0; } n = L * k0 + indices[in]; } return n; } </code></pre> This is arguably getting beyond "brute force" and is good for boosting the speed another 27.5% to: <pre><code>41.6026 primes/millisecond </code></pre> Implementation 7 It is practical to play the above game for one more iteration, developing a formula for numbers not divisible by 2, 3, 5 and 7: <pre><code>210 * k + {1, 11, ...}, </code></pre> The source code isn't shown here, but is very similar to implementation 6. This is the implementation I chose to actually use for the unordered containers of <a href="https://github.com/llvm-mirror/libcxx/blob/master/src/hash.cpp" rel="noreferrer">libc++</a>, and that source code is open source (found at the link). This final iteration is good for another 14.6% speed boost to: <pre><code>47.685 primes/millisecond </code></pre> Use of this algorithm assures that clients of <a href="http://libcxx.llvm.org/" rel="noreferrer">libc++</a>'s hash tables can choose any prime they decide is most beneficial to their situation, and the performance for this application is quite acceptable.
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POGiven Prime Number N, Compute the Next Prime?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USCommunity
UserOwnerUserId
1. USHoward Hinnant
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COSince the divide instruction on just about any CPU architecture produces the remainder as well as the quotient, the fact that implementation 3 doubled the speed over implementation 2 indicates that your compiler did not optimize very well and used two divide instructions in implementation 2. On GCC 4.7.1 implementation 2 is indeed 4 % faster than implementation 3, since there is no need for the additional multiply. Your compiler might do better if you switch the two if clauses around. In fact the `return false` case is more likely so it's worth switching for that reason alone (1 % speed-up).
 singulars
 PostPostId
 PO
 UserUserId
 UShan

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.