Note that there are some explanatory texts on larger screens.

plurals
  1. POTLS variable lookup speed
    text
    copied!<p>Consider the following program:</p> <pre><code>#include &lt;pthread.h&gt; static int final_value = 0; #ifdef TLS_VAR static int __thread tls_var; #else static int tls_var; #endif void __attribute__ ((noinline)) modify_tls(void) { tls_var++; } void *thread_function(void *unused) { const int iteration_count = 1 &lt;&lt; 25; tls_var = 0; for (int i = 0; i &lt; iteration_count; i++) { modify_tls(); } final_value += tls_var; return NULL; } int main() { const int thread_count = 1 &lt;&lt; 7; pthread_t thread_ids[thread_count]; for (int i = 0; i &lt; thread_count; i++) { pthread_create(&amp;thread_ids[i], NULL, thread_function, NULL); } for (int i = 0; i &lt; thread_count; i++) { pthread_join(thread_ids[i], NULL); } return 0; } </code></pre> <p>On my i7, it takes 1.308 seconds to execute with <code>TLS_VAR</code> defined and 8.392 seconds with it undefined; and I am unable to account for such a huge difference.</p> <p>The assembly for <code>modify_tls</code> looks like this (I've only mentioned the parts that are different):</p> <pre><code>;; !defined(TLS_VAR) movl tls_var(%rip), %eax addl $1, %eax movl %eax, tls_var(%rip) ;; defined(TLS_VAR) movl %fs:tls_var@tpoff, %eax addl $1, %eax movl %eax, %fs:tls_var@tpoff </code></pre> <p>The TLS lookup is understandable, with a load from the TCB. But why is the <code>tls_var</code> load in the first case relative to <code>%rip</code>? Why can't it be a direct memory address which gets relocated by the loader? Is this <code>%rip</code> relative load responsible for the slowness? If so, why?</p> <p>Compile flags: <code>gcc -O3 -std=c99 -Wall -Werror -lpthread</code></p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload