Note that there are some explanatory texts on larger screens.

plurals
  1. POEmbarassingly Parallel DB Update Using Python (PostGIS/PostgreSQL)
    primarykey
    data
    text
    <p>I need to update every record in a spatial database in which I have a data set of points that overlay data set of polygons. For each point feature I want to assign a key to relate it to the polygon feature that it lies within. So if my point 'New York City' lies within polygon USA and for the USA polygon 'GID = 1' I will assign 'gid_fkey = 1' for my point New York City. </p> <p>To do this I have created the following query.</p> <pre><code>procQuery = 'UPDATE city SET gid_fkey = gid FROM country WHERE ST_within((SELECT the_geom FROM city WHERE wp_id = %s), country.the_geom) AND city_id = %s' % (cityID, cityID) </code></pre> <p>At present I am getting the cityID info from another query that just selects all cityID where gid_fkey is NULL. Essentially I just need to loop through these and run the query shown earlier. As the query only relies on static information in the other table in theory all of these processes can be run at once. I have implemented the threading procedure below but I can't seem to make the migration to multiprocessing</p> <pre><code>import psycopg2, pprint, threading, time, Queue queue = Queue.Queue() pyConn = psycopg2.connect("dbname='geobase_1' host='localhost'") pyConn.set_isolation_level(0) pyCursor1 = pyConn.cursor() getGID = 'SELECT cityID FROM city' pyCursor1.execute(getGID) gidList = pyCursor1.fetchall() class threadClass(threading.Thread): def __init__(self, queue): threading.Thread.__init__(self) self.queue = queue def run(self): while True: gid = self.queue.get() procQuery = 'UPDATE city SET gid_fkey = gid FROM country WHERE ST_within((SELECT the_geom FROM city WHERE wp_id = %s), country.the_geom) AND city_id = %s' % (cityID, cityID) pyCursor2 = pyConn.cursor() pyCursor2.execute(procQuery) print gid[0] print 'Done' def main(): for i in range(4): t = threadClass(queue) t.setDaemon(True) t.start() for gid in gidList: queue.put(gid) queue.join() main() </code></pre> <p>I'm not even sure if the multithreading is optimal but it is definitely faster than going through one by one.</p> <p>The machine I will be using has four cores (Quad Core) and a minimal Linux OS with no GUI, PostgreSQL, PostGIS and Python if that makes a difference.</p> <p>What do I need to change to get this painfully easy multiprocessing task enabled?</p>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload