Note that there are some explanatory texts on larger screens.

plurals
  1. POPython: multiprocessing pool memory leak
    primarykey
    data
    text
    <p>I have an array of time courses that is 8640 x 400.<br> EDIT: The 0th dim are locations and the 1st dim is a time course for that loc.</p> <p>I need to compute the cross spectral coherence for each point and these can all be done independently.</p> <p>Thus I started trying to use the multiprocessing module:</p> <pre><code>from multiprocessing import Pool import numpy as np from matplotlib.mlab import cohere from itertools import product from scipy.signal import detrend # this is a module of classes that I wrote myself from MyTools import SignalProcessingTools as sig def compute_coherence(args): rowA = roi[args[0], :] rowB = roi[args[1], :] coh, _ = cohere(rowA, rowB, NFFT=64, Fs=sample_rate, noverlap=32, sides='onesided') #TODO: use the freq return and only average the freq in particular range... return np.sqrt(coh.mean()) ### start here ### # I detrend the data for linear features roi = detrend(data=roi, axis=1, type='linear') # and normalize it to std. Very simple method, uses x.std() in a loop roi = sig.normalize_std(roi) roi = np.random.rand(8640, 386)# in reality this is a load from disk length = roi.shape[0] indices = np.arange(length) # this gives me all combinations of indices i and j # since I want the cross spectral coherence of the array args = product(indices, indices) # note, args is an interator obj pool = Pool(processes=20) coh = pool.map(compute_coherence, args) </code></pre> <p>This program uses over 20 GB and I don't see an obvious memory leak. There's a lot of google returns on the topic but I don't really understand how to track this down. </p> <p>EDIT: Big mistake...the roi array is NOT 8640x8640x400 it is only 8640 x 400 Sorry... :| long day</p> <p>Perhaps there's a mistake that I'm missing...?</p> <p>Thanks for your thoughts in advance...</p> <p><strong>[update]</strong> So after modifying the code and playing around with commenting out sections, I believe that I have narrowed the memory problem down to the cohere() method. Running the code and just returning arrays of zeros works fine.</p> <p>Here's an updated version:</p> <pre><code>from os import path, getenv import numpy as np from matplotlib import mlab import scipy.signal as sig from multiprocessing import Pool from itertools import product import Tools from scipy.signal import detrend from pympler import tracker tr = tracker.SummaryTracker() import gc def call_back(): gc.collect() def call_compute(arg): start, stop = arg ind_pairs = indice_combos[start:stop] coh = np.zeros(len(ind_pairs), dtype=float) #tr.print_diff() for i, ind in enumerate(ind_pairs): row1 = ind[0] row2 = ind[1] mag, _ = mlab.cohere(roi[row1,:], roi[row2,:], NFFT=128, Fs=sample_rate, noverlap=64, sides='onesided') coh[i] = np.sqrt(mag.mean()) #tr.print_diff() #tr.print_diff() return coh ### start Here ### imagetools = Tools.ImageTools() sigtools = Tools.SignalProcess() HOME = Tools.HOME sample_rate = 1 / 1.65 mask_obj = imagetools.load_image(path.join(HOME, 'python_conn/Rat/Inputs/rat_gm_rs.nii.gz')) mask_data = mask_obj.get_data() rs_obj = imagetools.load_image(path.join(HOME, 'python_conn/Rat/Inputs/rs_4D.nii.gz')) rs_data = rs_obj.get_data() # logical index ind = mask_data &gt; 0 roi = rs_data[ind, :] # normalize with STD roi = sigtools.normalize_nd(roi) # detrend linear roi = detrend(data=roi, axis=1, type='linear') # filter roi = sigtools.butter_bandpass(lowcut=0.002, highcut=0.1, sample_rate=sample_rate, data=roi, order=5) # drop frames for steady state and filter noise roi = roi[:, 16:] ################ ### testing #### roi = roi[0:5000,:] ################ ################ length = roi.shape[0] # setup up row and col vector of indices indices = np.arange(length) temp = product(indices, indices)# all possible combinations iterator indice_combos = [ i for i in temp ] # make iterator into a list num_cores = 10 chunk_size = len(indice_combos) / num_cores # divdide the combo list for each core grps = np.arange(0, len(indice_combos)+chunk_size, chunk_size) #make the final list of args, where each item is a pair of stop and stop args = [ [grps[i], grps[i+1]-1] for i in range(0, len(grps)-1)] args[-1][1] = args[-1][1] + 1 # deallocate some memory grps = None # Multi core pool = Pool(num_cores) coh = np.hstack(pool.map(call_compute, args, call_back())) coh = coh.ravel() out_path = path.join(HOME, 'python_conn/Rat/coh.npy') np.save(out_path, coh) map = np.zeros_like(mask_data) map[ind] = coh.sum(0) out_path = path.join(HOME, 'python_conn/Rat/coherence_map.nii.gz') imagetools.save_new_image(map, out_path, rs_obj.coordmap) </code></pre> <p><strong>[update]</strong></p> <p>It's not cohere's fault...my bad...I hope the developer doesn't see this... :| I changed the code a lot. So I'm afraid this thread is prolly not valid anymore.</p> <p>What helped:</p> <h1>Only use iterators</h1> <h1>Send processes more than one pair of i,j to work on</h1> <p>There's a lot of overhead but the memory doesn't actually go up that much. I feel like I've abused SO a little...but it's always hard to be precise here when you're learning something new...I'm surprised no one has hated on me yet. I'll post my own solution tomorrow.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload