StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POmemory allocation and access on NUMA hardware
text
Body
copied!<p>I am developing a scientific computing tool in python, that should be capable of distributing work over multiple cores in a NUMA shared memory environment. I am looking into the most efficient way of doing this.</p> <p>Threads are -unfortunately- out of the game because of python's global interpreter lock, which leaves a fork as my only option. For inter process communication I suppose my options are pipes, sockets, or mmap. Please point it out if things are missing in this list.</p> <p>My application will require quite some communication between processes, and access to some amount of common data. My main concern is latency.</p> <p>My questions: when I fork a process, will its memory be located near the core it is assigned to? As fork in *nix copies on write, initially I suppose this cannot be the case. Do I want to force a copy for faster memory access, and if so, what is the best way do that? If I use mmap for communication, can that memory still be distributed over the cores or will it be located at a single one? Is there a process that transparently relocates data to optimize access? Is there a way to have direct control over physical allocation, or a way to request information about allocation to aid optimization?</p> <p>On a higher level, which of these things are determined my hardware and which by the operating system? I am in the process of buying a high-end multisocket machine and doubting between AMD Opteron and Intel Xeon. What are the implications of specific hardware on any of the questions above?</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload