Note that there are some explanatory texts on larger screens.

plurals
  1. POInsert 0-values for missing dates within MultiIndex
    primarykey
    data
    text
    <p>Let's assume I have a MultiIndex which consists of the date and some categories (one for simplicity in the example below) and for each category I have a time series with values of some process. I only have a value when there was an observation and I now want to add a "0" whenever there was no observation on that date. I found a way which seems very inefficient (stacking and unstacking which will create many many columns in case of millions of categories).</p> <pre><code>import datetime as dt import pandas as pd days= 4 #List of all dates that should be in the index all_dates = [datetime.date(2013, 2, 13) - dt.timedelta(days=x) for x in range(days)] df = pd.DataFrame([ (datetime.date(2013, 2, 10), 1, 4), (datetime.date(2013, 2, 10), 2, 7), (datetime.date(2013, 2, 11), 2, 7), (datetime.date(2013, 2, 13), 1, 2), (datetime.date(2013, 2, 13), 2, 3)], columns = ['date', 'category', 'value']) df.set_index(['date', 'category'], inplace=True) print df print df.unstack().reindex(all_dates).fillna(0).stack() # insert 0 values for missing dates print all_dates value date category 2013-02-10 1 4 2 7 2013-02-11 2 7 2013-02-13 1 2 2 3 value category 2013-02-13 1 2 2 3 2013-02-12 1 0 2 0 2013-02-11 1 0 2 7 2013-02-10 1 4 2 7 [datetime.date(2013, 2, 13), datetime.date(2013, 2, 12), datetime.date(2013, 2, 11), datetime.date(2013, 2, 10)] </code></pre> <p>Does anybody know a smarter way to achieve the same?</p> <p>EDIT: I found another possibility to achieve the same:</p> <pre><code>import datetime as dt import pandas as pd days= 4 #List of all dates that should be in the index all_dates = [datetime.date(2013, 2, 13) - dt.timedelta(days=x) for x in range(days)] df = pd.DataFrame([(datetime.date(2013, 2, 10), 1, 4, 5), (datetime.date(2013, 2, 10), 2,1, 7), (datetime.date(2013, 2, 10), 2,2, 7), (datetime.date(2013, 2, 11), 2,3, 7), (datetime.date(2013, 2, 13), 1,4, 2), (datetime.date(2013, 2, 13), 2,4, 3)], columns = ['date', 'category', 'cat2', 'value']) date_col = 'date' other_index = ['category', 'cat2'] index = [date_col] + other_index df.set_index(index, inplace=True) grouped = df.groupby(level=other_index) df_list = [] for i, group in grouped: df_list.append(group.reset_index(level=other_index).reindex(all_dates).fillna(0)) print pd.concat(df_list).set_index(other_index, append=True) value category cat2 2013-02-13 1 4 2 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 1 4 5 2013-02-13 0 0 0 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 2 1 7 2013-02-13 0 0 0 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 2 2 7 2013-02-13 0 0 0 2013-02-12 0 0 0 2013-02-11 2 3 7 2013-02-10 0 0 0 2013-02-13 2 4 3 2013-02-12 0 0 0 2013-02-11 0 0 0 2013-02-10 0 0 0 </code></pre>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload