Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>While you can't get much faster than PIL crop in a single thread, you <em>can</em> use multiple cores to speed up everything! :) </p> <p>I ran the below code on my 8 core i7 machine as well as my 7 year old, two core, barely 2ghz laptop. Both saw significant improvements in run time. Much as you would expect, the improvement was dependent on the number of cores available. </p> <p>The core of your code is the same, I just separated the looping from the actual computation so that the function could be applies to a list of values in parallel. </p> <p>So, this: </p> <pre><code>for i in range(0,num_images): t = time.time() im = Image.open('%03i.png'%i) w,h = im.size imc = im.crop((w-50,h-50,w+50,h+50)) print 'Time to open: %.4f seconds'%(time.time()-t) #convert them to numpy arrays data = np.array(imc) </code></pre> <p>Became: </p> <pre><code>def convert(filename): im = Image.open(filename) w,h = im.size imc = im.crop((w-50,h-50,w+50,h+50)) return numpy.array(imc) </code></pre> <p>The key to the speedup is the <code>Pool</code> feature of the <code>multiprocessing</code> library. It makes it trivial to run things across multiple processors. </p> <h2>Full code:</h2> <pre><code>import os import time import numpy from PIL import Image from multiprocessing import Pool # Path to where my test images are stored img_folder = os.path.join(os.getcwd(), 'test_images') # Collects all of the filenames for the images # I want to process images = [os.path.join(img_folder,f) for f in os.listdir(img_folder) if '.jpeg' in f] # Your code, but wrapped up in a function def convert(filename): im = Image.open(filename) w,h = im.size imc = im.crop((w-50,h-50,w+50,h+50)) return numpy.array(imc) def main(): # This is the hero of the code. It creates pool of # worker processes across which you can "map" a function pool = Pool() t = time.time() # We run it normally (single core) first np_arrays = map(convert, images) print 'Time to open %i images in single thread: %.4f seconds'%(len(images), time.time()-t) t = time.time() # now we run the same thing, but this time leveraging the worker pool. np_arrays = pool.map(convert, images) print 'Time to open %i images with multiple threads: %.4f seconds'%(len(images), time.time()-t) if __name__ == '__main__': main() </code></pre> <p>Pretty basic. Only a few extra lines of code, and a little refactoring to move the conversion bit into its own function. The results speak for themselves: </p> <h2>Results:</h2> <h3>8-Core i7</h3> <pre><code>Time to open 858 images in single thread: 6.0040 seconds Time to open 858 images with multiple threads: 1.4800 seconds </code></pre> <h3>2-Core Intel Duo</h3> <pre><code>Time to open 858 images in single thread: 8.7640 seconds Time to open 858 images with multiple threads: 4.6440 seconds </code></pre> <p>So there ya go! Even if you have a super old 2 core machine you can halve the time you spend opening and processing your images. </p> <h3>Caveats</h3> <p>Memory. If you're processing 1000s of images, you're probably going to pop Pythons Memory limit at some point. To get around this, you'll just have to process the data in chunks. You can still leverage all of the multiprocessing goodness, just in smaller bites. Something like: </p> <pre><code>for i in range(0, len(images), chunk_size): results = pool.map(convert, images[i : i+chunk_size]) # rest of code. </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. VO
      singulars
      1. This table or related slice is empty.
    2. VO
      singulars
      1. This table or related slice is empty.
    3. VO
      singulars
      1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload