StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POInsert 0-values for missing dates within MultiIndex
primarykey
Id
14856941
data
AcceptedAnswerId
0
AnswerCount
2
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2013-02-13T15:26:11.823
FavoriteCount
2
LastActivityDate
2016-12-22T03:05:49.100
LastEditDate
2013-02-14T13:58:25.290
LastEditorUserId
942591
OwnerUserId
942591
ParentId
0
PostTypeId
1
Score
4
ViewCount
2824
LastEditorDisplayName
text
Body
Let's assume I have a MultiIndex which consists of the date and some categories (one for simplicity in the example below) and for each category I have a time series with values of some process. I only have a value when there was an observation and I now want to add a "0" whenever there was no observation on that date. I found a way which seems very inefficient (stacking and unstacking which will create many many columns in case of millions of categories). <pre><code>import datetime as dt import pandas as pd days= 4 #List of all dates that should be in the index all_dates = [datetime.date(2013, 2, 13) - dt.timedelta(days=x) for x in range(days)] df = pd.DataFrame([ (datetime.date(2013, 2, 10), 1, 4), (datetime.date(2013, 2, 10), 2, 7), (datetime.date(2013, 2, 11), 2, 7), (datetime.date(2013, 2, 13), 1, 2), (datetime.date(2013, 2, 13), 2, 3)], columns = ['date', 'category', 'value']) df.set_index(['date', 'category'], inplace=True) print df print df.unstack().reindex(all_dates).fillna(0).stack() # insert 0 values for missing dates print all_dates value date category 2013-02-10 1 4 2 7 2013-02-11 2 7 2013-02-13 1 2 2 3 value category 2013-02-13 1 2 2 3 2013-02-12 1 0 2 0 2013-02-11 1 0 2 7 2013-02-10 1 4 2 7 [datetime.date(2013, 2, 13), datetime.date(2013, 2, 12), datetime.date(2013, 2, 11), datetime.date(2013, 2, 10)] </code></pre> Does anybody know a smarter way to achieve the same? EDIT: I found another possibility to achieve the same: <pre><code>import datetime as dt import pandas as pd days= 4 #List of all dates that should be in the index all_dates = [datetime.date(2013, 2, 13) - dt.timedelta(days=x) for x in range(days)] df = pd.DataFrame([(datetime.date(2013, 2, 10), 1, 4, 5), (datetime.date(2013, 2, 10), 2,1, 7), (datetime.date(2013, 2, 10), 2,2, 7), (datetime.date(2013, 2, 11), 2,3, 7), (datetime.date(2013, 2, 13), 1,4, 2), (datetime.date(2013, 2, 13), 2,4, 3)], columns = ['date', 'category', 'cat2', 'value']) date_col = 'date' other_index = ['category', 'cat2'] index = [date_col] + other_index df.set_index(index, inplace=True) grouped = df.groupby(level=other_index) df_list = [] for i, group in grouped: df_list.append(group.reset_index(level=other_index).reindex(all_dates).fillna(0)) print pd.concat(df_list).set_index(other_index, append=True) value category cat2 2013-02-13 1 4 2 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 1 4 5 2013-02-13 0 0 0 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 2 1 7 2013-02-13 0 0 0 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 2 2 7 2013-02-13 0 0 0 2013-02-12 0 0 0 2013-02-11 2 3 7 2013-02-10 0 0 0 2013-02-13 2 4 3 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 0 0 0 </code></pre>
Tags
<pandas>
Title
Insert 0-values for missing dates within MultiIndex
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USArthur G
UserOwnerUserId
1. USArthur G
plurals
PostLinksPostIdRelatedPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POInsert 0-values for missing dates within MultiIndex
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COI like your stack/unstack method. I am not sure if there is a better way to add rows. If you know all the categories, maybe you could make a DF with all the dates/categories and merge it with your data-containing DF. That would leave NAs that you could fill with zeros. I don't know if that would be faster though.....
 singulars
 PostPostId
 POInsert 0-values for missing dates within MultiIndex
 UserUserId
 USzach
2. COthe version that iterates through the group does not throw a memoryerror for my local dataset (the stack/unstack version does)
 singulars
 PostPostId
 POInsert 0-values for missing dates within MultiIndex
 UserUserId
 USArthur G

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.