StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

PO
primarykey
Id
8783634
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
8
CommunityOwnedDate
CreationDate
2012-01-09T03:31:13.530
FavoriteCount
0
LastActivityDate
2012-01-11T20:37:57.483
LastEditDate
2012-01-11T20:37:57.483
LastEditorUserId
325565
OwnerUserId
325565
ParentId
8780912
PostTypeId
2
Score
17
ViewCount
0
LastEditorDisplayName
text
Body
The easiest thing to do is to linearlize the problem. You're using a non-linear, iterative method which will be slower than a linear least squares solution. Basically, you have: <code>y = height * exp(-(x - mu)^2 / (2 * sigma</code>^2)) To make this a linear equation, take the (natural) log of both sides: <pre><code>ln(y) = ln(height) - (x - mu)^2 / (2 * sigma^2) </code></pre> This then simplifies to the polynomial: <pre><code>ln(y) = -x^2 / (2 * sigma^2) + x * mu / sigma^2 - mu^2 / sigma^2 + ln(height) </code></pre> We can recast this in a bit simpler form: <pre><code>ln(y) = A * x^2 + B * x + C </code></pre> where: <pre><code>A = 1 / (2 * sigma^2) B = mu / (2 * sigma^2) C = mu^2 / sigma^2 + ln(height) </code></pre> However, there's one catch. This will become unstable in the presence of noise in the "tails" of the distribution. Therefore, we need to use only the data near the "peaks" of the distribution. It's easy enough to only include data that falls above some threshold in the fitting. In this example, I'm only including data that's greater than 20% of the maximum observed value for a given gaussian curve that we're fitting. Once we've done this, though, it's rather fast. Solving for 262144 different gaussian curves takes only ~1 minute (Be sure to removing the plotting portion of the code if you run it on something that large...). It's also quite easy to parallelize, if you want... <pre><code>import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl import itertools def main(): x, data = generate_data(256, 6) model = [invert(x, y) for y in data.T] sigma, mu, height = [np.array(item) for item in zip(*model)] prediction = gaussian(x, sigma, mu, height) plot(x, data, linestyle='none', marker='o') plot(x, prediction, linestyle='-') plt.show() def invert(x, y): # Use only data within the "peak" (20% of the max value...) key_points = y > (0.2 * y.max()) x = x[key_points] y = y[key_points] # Fit a 2nd order polynomial to the log of the observed values A, B, C = np.polyfit(x, np.log(y), 2) # Solve for the desired parameters... sigma = np.sqrt(-1 / (2.0 * A)) mu = B * sigma**2 height = np.exp(C + 0.5 * mu**2 / sigma**2) return sigma, mu, height def generate_data(numpoints, numcurves): np.random.seed(3) x = np.linspace(0, 500, numpoints) height = 100 * np.random.random(numcurves) mu = 200 * np.random.random(numcurves) + 200 sigma = 100 * np.random.random(numcurves) + 0.1 data = gaussian(x, sigma, mu, height) noise = 5 * (np.random.random(data.shape) - 0.5) return x, data + noise def gaussian(x, sigma, mu, height): data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2) return height * np.exp(data) def plot(x, ydata, ax=None, **kwargs): if ax is None: ax = plt.gca() colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle']) for y, color in zip(ydata.T, colorcycle): ax.plot(x, y, color=color, **kwargs) main() </code></pre> <img src="https://i.stack.imgur.com/3h9Db.png" alt="enter image description here"> The only thing we'd need to change for a parallel version is the main function. (We also need a dummy function because <code>multiprocessing.Pool.imap</code> can't supply additional arguments to its function...) It would look something like this: <pre><code>def parallel_main(): import multiprocessing p = multiprocessing.Pool() x, data = generate_data(256, 262144) args = itertools.izip(itertools.repeat(x), data.T) model = p.imap(parallel_func, args, chunksize=500) sigma, mu, height = [np.array(item) for item in zip(*model)] prediction = gaussian(x, sigma, mu, height) def parallel_func(args): return invert(*args) </code></pre> Edit: In cases where the simple polynomial fitting isn't working well, try weighting the problem by the y-values, <a href="http://scipy-central.org/item/28/2/fitting-a-gaussian-to-noisy-data-points" rel="nofollow noreferrer">as mentioned in the link/paper</a> that @tslisten shared (and Stefan van der Walt implemented, though my implementation is a bit different). <pre><code>import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl import itertools def main(): def run(x, data, func, threshold=0): model = [func(x, y, threshold=threshold) for y in data.T] sigma, mu, height = [np.array(item) for item in zip(*model)] prediction = gaussian(x, sigma, mu, height) plt.figure() plot(x, data, linestyle='none', marker='o', markersize=4) plot(x, prediction, linestyle='-', lw=2) x, data = generate_data(256, 6, noise=100) threshold = 50 run(x, data, weighted_invert, threshold=threshold) plt.title('Weighted by Y-Value') run(x, data, invert, threshold=threshold) plt.title('Un-weighted Linear Inverse' plt.show() def invert(x, y, threshold=0): mask = y > threshold x, y = x[mask], y[mask] # Fit a 2nd order polynomial to the log of the observed values A, B, C = np.polyfit(x, np.log(y), 2) # Solve for the desired parameters... sigma, mu, height = poly_to_gauss(A,B,C) return sigma, mu, height def poly_to_gauss(A,B,C): sigma = np.sqrt(-1 / (2.0 * A)) mu = B * sigma**2 height = np.exp(C + 0.5 * mu**2 / sigma**2) return sigma, mu, height def weighted_invert(x, y, weights=None, threshold=0): mask = y > threshold x,y = x[mask], y[mask] if weights is None: weights = y else: weights = weights[mask] d = np.log(y) G = np.ones((x.size, 3), dtype=np.float) G[:,0] = x**2 G[:,1] = x model,_,_,_ = np.linalg.lstsq((G.T*weights**2).T, d*weights**2) return poly_to_gauss(*model) def generate_data(numpoints, numcurves, noise=None): np.random.seed(3) x = np.linspace(0, 500, numpoints) height = 7000 * np.random.random(numcurves) mu = 1100 * np.random.random(numcurves) sigma = 100 * np.random.random(numcurves) + 0.1 data = gaussian(x, sigma, mu, height) if noise is None: noise = 0.1 * height.max() noise = noise * (np.random.random(data.shape) - 0.5) return x, data + noise def gaussian(x, sigma, mu, height): data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2) return height * np.exp(data) def plot(x, ydata, ax=None, **kwargs): if ax is None: ax = plt.gca() colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle']) for y, color in zip(ydata.T, colorcycle): #kwargs['color'] = kwargs.get('color', color) ax.plot(x, y, color=color, **kwargs) main() </code></pre> <img src="https://i.stack.imgur.com/NdQvA.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/3jbvd.png" alt="enter image description here"> If that's still giving you trouble, then try iteratively-reweighting the least-squares problem (The final "best" reccomended method in the link @tslisten mentioned). Keep in mind that this will be considerably slower, however. <pre><code>def iterative_weighted_invert(x, y, threshold=None, numiter=5): last_y = y for _ in range(numiter): model = weighted_invert(x, y, weights=last_y, threshold=threshold) last_y = gaussian(x, *model) return model </code></pre>
Tags
Title
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. POHow can I perform a least-squares fitting over multiple data sets fast?
 singulars
 PostTypePostTypeId
 PTQuestion
PostTypePostTypeId
1. PTAnswer
UserLastEditorUserId
1. USJoe Kington
UserOwnerUserId
1. USJoe Kington
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. POHow can I perform a least-squares fitting over multiple data sets fast?
 singulars
 PostTypePostTypeId
 PTQuestion
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTAcceptedByOriginator
2. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
3. VO
 singulars
 PostPostId
 PO
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.