Note that there are some explanatory texts on larger screens.

plurals
  1. POPython multiprocessing design
    text
    copied!<p>I have written an algorithm that takes geospatial data and performs a number of steps. The input data are a shapefile of polygons and covariate rasters for a large raster study area (~150 million pixels). The steps are as follows:</p> <ol> <li>Sample points from within polygons of the shapefile</li> <li>For each sampling point, extract values from the covariate rasters</li> <li>Build a predictive model on the sampling points</li> <li>Extract covariates for target grid points</li> <li>Apply predictive model to target grid</li> <li>Write predictions to a set of output grids</li> </ol> <p>The whole process needs to be iterated a number of times (say 100) but each iteration currently takes more than an hour when processed in series. For each iteration, the most time-consuming parts are step 4 and 5. Because the target grid is so large, I've been processing it a block (say 1000 rows) at a time.</p> <p>I have a 6-core CPU with 32 Gb RAM, so within each iteration, I had a go at using Python's <code>multiprocessing</code> module with a <code>Pool</code> object to process a number of blocks simultaneously (steps 4 and 5) and then write the output (the predictions) to the common set of output grids using a callback function that calls a global output-writing function. This seems to work, but is no faster (actually, it's probably slower) than processing each block in series.</p> <p>So my question is, is there a more efficient way to do it? I'm interested in the multiprocessing module's <code>Queue</code> class, but I'm not really sure how it works. For example, I'm wondering if it's more efficient to have a queue that carries out steps 4 and 5 then passes the results to another queue that carries out step 6. Or is this even what Queue is for?</p> <p>Any pointers would be appreciated.</p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload