StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POEfficient unsigned-to-signed cast avoiding implementation-defined behavior
text
Body
copied!<p>I want to define a function that takes an <code>unsigned int</code> as argument and returns an <code>int</code> congruent modulo UINT_MAX+1 to the argument.</p> <p>A first attempt might look like this:</p> <pre><code>int unsigned_to_signed(unsigned n) { return static_cast<int>(n); } </code></pre> <p>But as any language lawyer knows, casting from unsigned to signed for values larger than INT_MAX is implementation-defined.</p> <p>I want to implement this such that (a) it only relies on behavior mandated by the spec; and (b) it compiles into a no-op on any modern machine and optimizing compiler.</p> <p>As for bizarre machines... If there is no signed int congruent modulo UINT_MAX+1 to the unsigned int, let's say I want to throw an exception. If there is more than one (I am not sure this is possible), let's say I want the largest one.</p> <p>OK, second attempt:</p> <pre><code>int unsigned_to_signed(unsigned n) { int int_n = static_cast<int>(n); if (n == static_cast<unsigned>(int_n)) return int_n; // else do something long and complicated } </code></pre> <p>I do not much care about the efficiency when I am not on a typical twos-complement system, since in my humble opinion that is unlikely. And if my code becomes a bottleneck on the omnipresent sign-magnitude systems of 2050, well, I bet someone can figure that out and optimize it then.</p> <p>Now, this second attempt is pretty close to what I want. Although the cast to <code>int</code> is implementation-defined for some inputs, the cast back to <code>unsigned</code> is guaranteed by the standard to preserve the value modulo UINT_MAX+1. So the conditional does check exactly what I want, and it will compile into nothing on any system I am likely to encounter.</p> <p>However... I am still casting to <code>int</code> without first checking whether it will invoke implementation-defined behavior. On some hypothetical system in 2050 it could do who-knows-what. So let's say I want to avoid that.</p> <p>Question: What should my "third attempt" look like?</p> <p>To recap, I want to:</p> <ul> <li>Cast from unsigned int to signed int</li> <li>Preserve the value mod UINT_MAX+1</li> <li>Invoke only standard-mandated behavior</li> <li>Compile into a no-op on a typical twos-complement machine with optimizing compiler</li> </ul> <p>[Update]</p> <p>Let me give an example to show why this is not a trivial question.</p> <p>Consider a hypothetical C++ implementation with the following properties:</p> <ul> <li><code>sizeof(int)</code> equals 4</li> <li><code>sizeof(unsigned)</code> equals 4</li> <li><code>INT_MAX</code> equals 32767</li> <li><code>INT_MIN</code> equals -2<sup>32</sup> + 32768</li> <li><code>UINT_MAX</code> equals 2<sup>32</sup> - 1</li> <li>Arithmetic on <code>int</code> is modulo 2<sup>32</sup> (into the range <code>INT_MIN</code> through <code>INT_MAX</code>)</li> <li><code>std::numeric_limits<int>::is_modulo</code> is true</li> <li>Casting unsigned <code>n</code> to int preserves the value for 0 <= n <= 32767 and yields <em>zero</em> otherwise</li> </ul> <p>On this hypothetical implementation, there is exactly one <code>int</code> value congruent (mod UINT_MAX+1) to each <code>unsigned</code> value. So my question would be well-defined.</p> <p>I claim that this hypothetical C++ implementation fully conforms to the C++98, C++03, and C++11 specifications. I admit I have not memorized every word of all of them... But I believe I have read the relevant sections carefully. So if you want me to accept your answer, you either must (a) cite a spec that rules out this hypothetical implementation or (b) handle it correctly.</p> <p>Indeed, a correct answer must handle <em>every</em> hypothetical implementation permitted by the standard. That is what "invoke only standard-mandated behavior" means, by definition.</p> <p>Incidentally, note that <code>std::numeric_limits<int>::is_modulo</code> is utterly useless here for multiple reasons. For one thing, it can be <code>true</code> even if unsigned-to-signed casts do not work for large unsigned values. For another, it can be <code>true</code> even on one's-complement or sign-magnitude systems, if arithmetic is simply modulo the entire integer range. And so on. If your answer depends on <code>is_modulo</code>, it's wrong.</p> <p>[Update 2]</p> <p><a href="https://stackoverflow.com/a/13208789/768469">hvd's answer</a> taught me something: My hypothetical C++ implementation for integers is <em>not</em> permitted by modern C. The C99 and C11 standards are very specific about the representation of signed integers; indeed, they only permit twos-complement, ones-complement, and sign-magnitude (section 6.2.6.2 paragraph (2); ).</p> <p>But C++ is not C. As it turns out, this fact lies at the very heart of my question.</p> <p>The original C++98 standard was based on the much older C89, which says (section 3.1.2.5):</p> <blockquote> <p>For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements. The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same.</p> </blockquote> <p>C89 says nothing about only having one sign bit or only allowing twos-complement/ones-complement/sign-magnitude.</p> <p>The C++98 standard adopted this language nearly verbatim (section 3.9.1 paragraph (3)):</p> <blockquote> <p>For each of the signed integer types, there exists a corresponding (but different) <em>unsigned integer type</em>: "<code>unsigned char</code>", "<code>unsigned short int</code>", "<code>unsigned int</code>", and "<code>unsigned long int</code>", each of which occupies the same amount of storage and has the same alignment requirements (3.9) as the corresponding signed integer type ; that is, each <em>signed integer</em> type has the same object representation as its corresponding <em>unsigned integer</em> type. The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same.</p> </blockquote> <p>The C++03 standard uses essentially identical language, as does C++11.</p> <p>No standard C++ spec constrains its signed integer representations to any C spec, as far as I can tell. And there is nothing mandating a single sign bit or anything of the kind. All it says is that <em>non-negative</em> signed integers must be a subrange of the corresponding unsigned.</p> <p>So, again I claim that INT_MAX=32767 with INT_MIN=-2<sup>32</sup>+32768 is permitted. If your answer assumes otherwise, it is incorrect unless you cite a <strong>C++</strong> standard proving me wrong.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload