StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POPython Pandas How to assign groupby operation results back to columns in parent dataframe?
text
Body
copied!<p>I have the following data frame in IPython, where each row is a single stock:</p> <pre><code>In [261]: bdata Out[261]: <class 'pandas.core.frame.DataFrame'> Int64Index: 21210 entries, 0 to 21209 Data columns: BloombergTicker 21206 non-null values Company 21210 non-null values Country 21210 non-null values MarketCap 21210 non-null values PriceReturn 21210 non-null values SEDOL 21210 non-null values yearmonth 21210 non-null values dtypes: float64(2), int64(1), object(4) </code></pre> <p>I want to apply a groupby operation that computes cap-weighted average return across everything, per each date in the "yearmonth" column.</p> <p>This works as expected:</p> <pre><code>In [262]: bdata.groupby("yearmonth").apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) Out[262]: yearmonth 201204 -0.109444 201205 -0.290546 </code></pre> <p>But then I want to sort of "broadcast" these values back to the indices in the original data frame, and save them as constant columns where the dates match.</p> <pre><code>In [263]: dateGrps = bdata.groupby("yearmonth") In [264]: dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/<ipython-input-264-4a68c8782426> in <module>() ----> 1 dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) TypeError: 'DataFrameGroupBy' object does not support item assignment </code></pre> <p>I realize this naive assignment should not work. But what is the "right" Pandas idiom for assigning the result of a groupby operation into a new column on the parent dataframe?</p> <p>In the end, I want a column called "MarketReturn" than will be a repeated constant value for all indices that have matching date with the output of the groupby operation.</p> <p>One hack to achieve this would be the following:</p> <pre><code>marketRetsByDate = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum()) bdata["MarketReturn"] = np.repeat(np.NaN, len(bdata)) for elem in marketRetsByDate.index.values: bdata["MarketReturn"][bdata["yearmonth"]==elem] = marketRetsByDate.ix[elem] </code></pre> <p>But this is slow, bad, and unPythonic.</p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload