Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Given say...</p> <pre><code>std::string x = "hello"; </code></pre> <h2> Getting a `char *` or `const char*` from a `string`</h2> <p><strong>How to get a character pointer that's valid while <code>x</code> remains in scope and isn't modified further</strong></p> <p><strong>C++11</strong> simplifies things; the following all give access to the same internal string buffer:</p> <pre><code>const char* p_c_str = x.c_str(); const char* p_data = x.data(); const char* p_x0 = &amp;x[0]; char* p_x0_rw = &amp;x[0]; // compiles iff x is not const... </code></pre> <p>All the above pointers will hold the <em>same value</em> - the address of the first character in the buffer. Even an empty string has a "first character in the buffer", because C++11 guarantees to always keep an extra NUL/0 terminator character after the explicitly assigned string content (e.g. <code>std::string("this\0that", 9)</code> will have a buffer holding <code>"this\0that\0"</code>).</p> <p>Given any of the above pointers:</p> <pre><code>char c = p[n]; // valid for n &lt;= x.size() // i.e. you can safely read the NUL at p[x.size()] </code></pre> <p>Only for the non-<code>const</code> pointer from <code>&amp;x[0]</code>:</p> <pre><code>p_x0_rw[n] = c; // valid for n &lt;= x.size() - 1 // i.e. don't overwrite the implementation maintained NUL </code></pre> <p>Writing a NUL elsewhere in the string does <em>not</em> change the <code>string</code>'s <code>size()</code>; <code>string</code>'s are allowed to contain any number of NULs - they are given no special treatment by <code>std::string</code> (same in C++03).</p> <p>In <strong>C++03</strong>, things were considerably more complicated (key differences <strong><em>highlighted</em></strong>):</p> <ul> <li><p><code>x.data()</code></p> <ul> <li>returns <code>const char*</code> to the string's internal buffer <strong><em>which wasn't required by the Standard to conclude with a NUL</em></strong> (i.e. might be <code>['h', 'e', 'l', 'l', 'o']</code> followed by uninitialised or garbage values, with accidental accesses thereto having <em>undefined behaviour</em>). <ul> <li><code>x.size()</code> characters are safe to read, i.e. <code>x[0]</code> through <code>x[x.size() - 1]</code></li> <li>for empty strings, you're guaranteed some non-NULL pointer to which 0 can be safely added (hurray!), but you shouldn't dereference that pointer.</li> </ul></li> </ul></li> <li><p><code>&amp;x[0]</code></p> <ul> <li><strong><em>for empty strings this has undefined behaviour</em></strong> (21.3.4) <ul> <li>e.g. given <code>f(const char* p, size_t n) { if (n == 0) return; ...whatever... }</code> you mustn't call <code>f(&amp;x[0], x.size());</code> when <code>x.empty()</code> - just use <code>f(x.data(), ...)</code>.</li> </ul></li> <li>otherwise, as per <code>x.data()</code> but: <ul> <li>for non-<code>const</code> <code>x</code> this yields a non-<code>const</code> <code>char*</code> pointer; you can overwrite string content</li> </ul></li> </ul></li> <li><p><code>x.c_str()</code></p> <ul> <li>returns <code>const char*</code> to an ASCIIZ (NUL-terminated) representation of the value (i.e. ['h', 'e', 'l', 'l', 'o', '\0']).</li> <li>although few if any implementations chose to do so, the C++03 Standard was worded to allow the string implementation the freedom to create a <strong><em>distinct NUL-terminated buffer</em></strong> <em>on the fly</em>, from the potentially non-NUL terminated buffer "exposed" by <code>x.data()</code> and <code>&amp;x[0]</code></li> <li><code>x.size()</code> + 1 characters are safe to read.</li> <li>guaranteed safe even for empty strings (['\0']).</li> </ul></li> </ul> <h2>Consequences of accessing outside legal indices</h2> <p>Whichever way you get a pointer, you must not access memory further along from the pointer than the characters guaranteed present in the descriptions above. Attempts to do so have <em>undefined behaviour</em>, with a very real chance of application crashes and garbage results even for reads, and additionally wholesale data, stack corruption and/or security vulnerabilities for writes.</p> <h2>When do those pointers get invalidated?</h2> <p>If you call some <code>string</code> member function that modifies the <code>string</code> or reserves further capacity, any pointer values returned beforehand by any of the above methods are <em>invalidated</em>. You can use those methods again to get another pointer. (The rules are the same as for iterators into <code>string</code>s).</p> <p>See also <em>How to get a character pointer valid even after <code>x</code> leaves scope or is modified further</em> below....</p> <h2>So, which is <em>better</em> to use?</h2> <p>From C++11, use <code>.c_str()</code> for ASCIIZ data, and <code>.data()</code> for "binary" data (explained further below).</p> <p>In C++03, use <code>.c_str()</code> unless certain that <code>.data()</code> is adequate, and prefer <code>.data()</code> over <code>&amp;x[0]</code> as it's safe for empty strings....</p> <p><em>...try to understand the program enough to use <code>data()</code> when appropriate, or you'll probably make other mistakes...</em></p> <p>The ASCII NUL '\0' character guaranteed by <code>.c_str()</code> is used by many functions as a sentinel value denoting the end of relevant and safe-to-access data. This applies to both C++-only functions like say <code>fstream::fstream(const char* filename, ...)</code> and shared-with-C functions like <code>strchr()</code>, and <code>printf()</code>.</p> <p>Given C++03's <code>.c_str()</code>'s guarantees about the returned buffer are a super-set of <code>.data()</code>'s, you can always safely use <code>.c_str()</code>, but people sometimes don't because:</p> <ul> <li>using <code>.data()</code> communicates to other programmers reading the source code that the data is not ASCIIZ (rather, you're using the string to store a block of data (which sometimes isn't even really textual)), or that you're passing it to another function that treats it as a block of "binary" data. This can be a crucial insight in ensuring that other programmers' code changes continue to handle the data properly.</li> <li>C++03 only: there's a slight chance that your <code>string</code> implementation will need to do some extra memory allocation and/or data copying in order to prepare the NUL terminated buffer</li> </ul> <p>As a further hint, if a function's parameters require the (<code>const</code>) <code>char*</code> but don't insist on getting <code>x.size()</code>, the function <em>probably</em> needs an ASCIIZ input, so <code>.c_str()</code> is a good choice (the function needs to know where the text terminates somehow, so if it's not a separate parameter it can only be a convention like a length-prefix or sentinel or some fixed expected length).</p> <h2>How to get a character pointer valid even after <code>x</code> leaves scope or is modified further</h2> <p>You'll need to <strong><em>copy</em></strong> the contents of the <code>string</code> <code>x</code> to a new memory area outside <code>x</code>. This external buffer could be in many places such as another <code>string</code> or character array variable, it may or may not have a different lifetime than <code>x</code> due to being in a different scope (e.g. namespace, global, static, heap, shared memory, memory mapped file).</p> <p>To copy the text from <code>std::string x</code> into an independent character array:</p> <pre><code>// USING ANOTHER STRING - AUTO MEMORY MANAGEMENT, EXCEPTION SAFE std::string old_x = x; // - old_x will not be affected by subsequent modifications to x... // - you can use `&amp;old_x[0]` to get a writable char* to old_x's textual content // - you can use resize() to reduce/expand the string // - resizing isn't possible from within a function passed only the char* address std::string old_x = x.c_str(); // old_x will terminate early if x embeds NUL // Copies ASCIIZ data but could be less efficient as it needs to scan memory to // find the NUL terminator indicating string length before allocating that amount // of memory to copy into, or more efficient if it ends up allocating/copying a // lot less content. // Example, x == "ab\0cd" -&gt; old_x == "ab". // USING A VECTOR OF CHAR - AUTO, EXCEPTION SAFE, HINTS AT BINARY CONTENT, GUARANTEED CONTIGUOUS EVEN IN C++03 std::vector&lt;char&gt; old_x(x.data(), x.data() + x.size()); // without the NUL std::vector&lt;char&gt; old_x(x.c_str(), x.c_str() + x.size() + 1); // with the NUL // USING STACK WHERE MAXIMUM SIZE OF x IS KNOWN TO BE COMPILE-TIME CONSTANT "N" // (a bit dangerous, as "known" things are sometimes wrong and often become wrong) char y[N + 1]; strcpy(y, x.c_str()); // USING STACK WHERE UNEXPECTEDLY LONG x IS TRUNCATED (e.g. Hello\0-&gt;Hel\0) char y[N + 1]; strncpy(y, x.c_str(), N); // copy at most N, zero-padding if shorter y[N] = '\0'; // ensure NUL terminated // USING THE STACK TO HANDLE x OF UNKNOWN (BUT SANE) LENGTH char* y = alloca(x.size() + 1); strcpy(y, x.c_str()); // USING THE STACK TO HANDLE x OF UNKNOWN LENGTH (NON-STANDARD GCC EXTENSION) char y[x.size() + 1]; strcpy(y, x.c_str()); // USING new/delete HEAP MEMORY, MANUAL DEALLOC, NO INHERENT EXCEPTION SAFETY char* y = new char[x.size() + 1]; strcpy(y, x.c_str()); // or as a one-liner: char* y = strcpy(new char[x.size() + 1], x.c_str()); // use y... delete[] y; // make sure no break, return, throw or branching bypasses this // USING new/delete HEAP MEMORY, SMART POINTER DEALLOCATION, EXCEPTION SAFE // see boost shared_array usage in Johannes Schaub's answer // USING malloc/free HEAP MEMORY, MANUAL DEALLOC, NO INHERENT EXCEPTION SAFETY char* y = strdup(x.c_str()); // use y... free(y); </code></pre> <h2>Other reasons to want a <code>char*</code> or <code>const char*</code> generated from a <code>string</code></h2> <p>So, above you've seen how to get a (<code>const</code>) <code>char*</code>, and how to make a copy of the text independent of the original <code>string</code>, but what can you <em>do</em> with it? A random smattering of examples...</p> <ul> <li>give "C" code access to the C++ <code>string</code>'s text, as in <code>printf("x is '%s'", x.c_str());</code></li> <li>copy <code>x</code>'s text to a buffer specified by your function's caller (e.g. <code>strncpy(callers_buffer, callers_buffer_size, x.c_str())</code>), or volatile memory used for device I/O (e.g. <code>for (const char* p = x.c_str(); *p; ++p) *p_device = *p;</code>)</li> <li>append <code>x</code>'s text to an character array already containing some ASCIIZ text (e.g. <code>strcat(other_buffer, x.c_str())</code>) - be careful not to overrun the buffer (in many situations you may need to use <code>strncat</code>)</li> <li>return a <code>const char*</code> or <code>char*</code> from a function (perhaps for historical reasons - client's using your existing API - or for C compatibility you don't want to return a <code>std::string</code>, but do want to copy your <code>string</code>'s data somewhere for the caller) <ul> <li>be careful not to return a pointer that may be dereferenced by the caller after a local <code>string</code> variable to which that pointer pointed has left scope</li> <li>some projects with shared objects compiled/linked for different <code>std::string</code> implementations (e.g. STLport and compiler-native) may pass data as ASCIIZ to avoid conflicts</li> </ul></li> </ul>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload