StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython: multiprocessing pool memory leak
primarykey
Id
19992038
data
AcceptedAnswerId
0
AnswerCount
1
ClosedDate
CommentCount
6
CommunityOwnedDate
CreationDate
2013-11-15T01:47:20.860
FavoriteCount
0
LastActivityDate
2013-11-16T02:00:04.857
LastEditDate
2013-11-16T02:00:04.857
LastEditorUserId
788619
OwnerUserId
788619
ParentId
0
PostTypeId
1
Score
2
ViewCount
1777
LastEditorDisplayName
text
Body
I have an array of time courses that is 8640 x 400. EDIT: The 0th dim are locations and the 1st dim is a time course for that loc. I need to compute the cross spectral coherence for each point and these can all be done independently. Thus I started trying to use the multiprocessing module: <pre><code>from multiprocessing import Pool import numpy as np from matplotlib.mlab import cohere from itertools import product from scipy.signal import detrend # this is a module of classes that I wrote myself from MyTools import SignalProcessingTools as sig def compute_coherence(args): rowA = roi[args[0], :] rowB = roi[args[1], :] coh, _ = cohere(rowA, rowB, NFFT=64, Fs=sample_rate, noverlap=32, sides='onesided') #TODO: use the freq return and only average the freq in particular range... return np.sqrt(coh.mean()) ### start here ### # I detrend the data for linear features roi = detrend(data=roi, axis=1, type='linear') # and normalize it to std. Very simple method, uses x.std() in a loop roi = sig.normalize_std(roi) roi = np.random.rand(8640, 386)# in reality this is a load from disk length = roi.shape[0] indices = np.arange(length) # this gives me all combinations of indices i and j # since I want the cross spectral coherence of the array args = product(indices, indices) # note, args is an interator obj pool = Pool(processes=20) coh = pool.map(compute_coherence, args) </code></pre> This program uses over 20 GB and I don't see an obvious memory leak. There's a lot of google returns on the topic but I don't really understand how to track this down. EDIT: Big mistake...the roi array is NOT 8640x8640x400 it is only 8640 x 400 Sorry... :| long day Perhaps there's a mistake that I'm missing...? Thanks for your thoughts in advance... [update] So after modifying the code and playing around with commenting out sections, I believe that I have narrowed the memory problem down to the cohere() method. Running the code and just returning arrays of zeros works fine. Here's an updated version: <pre><code>from os import path, getenv import numpy as np from matplotlib import mlab import scipy.signal as sig from multiprocessing import Pool from itertools import product import Tools from scipy.signal import detrend from pympler import tracker tr = tracker.SummaryTracker() import gc def call_back(): gc.collect() def call_compute(arg): start, stop = arg ind_pairs = indice_combos[start:stop] coh = np.zeros(len(ind_pairs), dtype=float) #tr.print_diff() for i, ind in enumerate(ind_pairs): row1 = ind[0] row2 = ind[1] mag, _ = mlab.cohere(roi[row1,:], roi[row2,:], NFFT=128, Fs=sample_rate, noverlap=64, sides='onesided') coh[i] = np.sqrt(mag.mean()) #tr.print_diff() #tr.print_diff() return coh ### start Here ### imagetools = Tools.ImageTools() sigtools = Tools.SignalProcess() HOME = Tools.HOME sample_rate = 1 / 1.65 mask_obj = imagetools.load_image(path.join(HOME, 'python_conn/Rat/Inputs/rat_gm_rs.nii.gz')) mask_data = mask_obj.get_data() rs_obj = imagetools.load_image(path.join(HOME, 'python_conn/Rat/Inputs/rs_4D.nii.gz')) rs_data = rs_obj.get_data() # logical index ind = mask_data > 0 roi = rs_data[ind, :] # normalize with STD roi = sigtools.normalize_nd(roi) # detrend linear roi = detrend(data=roi, axis=1, type='linear') # filter roi = sigtools.butter_bandpass(lowcut=0.002, highcut=0.1, sample_rate=sample_rate, data=roi, order=5) # drop frames for steady state and filter noise roi = roi[:, 16:] ################ ### testing #### roi = roi[0:5000,:] ################ ################ length = roi.shape[0] # setup up row and col vector of indices indices = np.arange(length) temp = product(indices, indices)# all possible combinations iterator indice_combos = [ i for i in temp ] # make iterator into a list num_cores = 10 chunk_size = len(indice_combos) / num_cores # divdide the combo list for each core grps = np.arange(0, len(indice_combos)+chunk_size, chunk_size) #make the final list of args, where each item is a pair of stop and stop args = [ [grps[i], grps[i+1]-1] for i in range(0, len(grps)-1)] args[-1][1] = args[-1][1] + 1 # deallocate some memory grps = None # Multi core pool = Pool(num_cores) coh = np.hstack(pool.map(call_compute, args, call_back())) coh = coh.ravel() out_path = path.join(HOME, 'python_conn/Rat/coh.npy') np.save(out_path, coh) map = np.zeros_like(mask_data) map[ind] = coh.sum(0) out_path = path.join(HOME, 'python_conn/Rat/coherence_map.nii.gz') imagetools.save_new_image(map, out_path, rs_obj.coordmap) </code></pre> [update] It's not cohere's fault...my bad...I hope the developer doesn't see this... :| I changed the code a lot. So I'm afraid this thread is prolly not valid anymore. What helped: <h1>Only use iterators</h1> <h1>Send processes more than one pair of i,j to work on</h1> There's a lot of overhead but the memory doesn't actually go up that much. I feel like I've abused SO a little...but it's always hard to be precise here when you're learning something new...I'm surprised no one has hated on me yet. I'll post my own solution tomorrow.
Tags
<python><numpy><multiprocessing>
Title
Python: multiprocessing pool memory leak
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USwbg
UserOwnerUserId
1. USwbg
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POPython: multiprocessing pool memory leak
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POPython: multiprocessing pool memory leak
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.