Note that there are some explanatory texts on larger screens.

plurals
  1. POConcurrently searching a graph in Python 3
    primarykey
    data
    text
    <p>I'd like to create a small p2p application that concurrently processes incoming data from other known / trusted nodes (it mostly stores it in an SQLite database). In order to recognize these nodes, upon connecting, each node introduces itself and my application then needs to check whether it knows this node directly or maybe indirectly through another node. Hence, I need to do a graph search which obviously needs processing time and which I'd like to outsource to a separate process (or even multiple worker processes? See my 2nd question below). Also, in some cases it is necessary to adjust the graph, add new edges or vertices.</p> <p>Let's say I have <strong>4 worker processes</strong> accepting and handling incoming connections via asynchronous I/O. <strong>What's the best way for them to access (read / modify) the graph?</strong> A single queue obviously doesn't do the trick for read access because I need to pass the search results back somehow.</p> <p>Hence, one way to do it would be another queue which would be filled by the graph searching process and which I could add to the event loop. The event loop could then pass the results to a handler. However, this event/callback-based approach would make it necessary to also always pass the corresponding sockets to the callbacks and thus to the Queue – which is <a href="https://stackoverflow.com/questions/16386678/too-many-open-files-python-multiprocess-tcp-server">nasty</a> because sockets are not picklable. (Let alone the fact that callbacks lead to spaghetti code.)</p> <p>Another idea that's just crossed my mind might be to create a pipe to the graph process for each incoming connection and then, on the graph's side, do asynchronous I/O as well. However, in order to avoid callbacks, if I understand correctly, I would need an async I/O library making use of <code>yield from</code> (i.e. <a href="http://www.python.org/dev/peps/pep-3156/" rel="nofollow noreferrer">tulip / PEP 3156</a>). Are there other options?</p> <p>Regarding async I/O on <strong>the graph's side</strong>: This is certainly the best way to handle many incoming requests at once but doing graph lookups is a CPU intensive task, thus <strong>could profit from using multiple worker threads or processes</strong>. <strong>The problem is</strong>: Multiple threads allow shared data but Python's GIL somewhat negates the performance benefit. Multiple processes on the other hand don't have this problem but how can I share and synchronize data between them? (For me it seems quite impossible to split up a graph.) Is there any way to solve this problem in a nice way? Also, does it make sense in terms of performance to mix asynchronous I/O with multithreading / multiprocessing?</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload