StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSplit series containing lists of strings into multiple columns
text
Body
copied!<p>I'm using pandas to perform some string matching from a Twitter dataset.</p> <p>I've imported a CSV of Tweets and indexed using the date. I've then created a new column containing text matches:</p> <pre><code>In [1]: import pandas as pd indata = pd.read_csv('tweets.csv') indata.index = pd.to_datetime(indata["Date"]) indata["matches"] = indata.Tweet.str.findall("rudd|abbott") only_results = pd.Series(indata["matches"]) only_results.head(10) Out[1]: Date 2013-08-06 16:03:17 [] 2013-08-06 16:03:12 [] 2013-08-06 16:03:10 [] 2013-08-06 16:03:09 [] 2013-08-06 16:03:08 [] 2013-08-06 16:03:07 [] 2013-08-06 16:03:07 [abbott] 2013-08-06 16:03:06 [] 2013-08-06 16:03:02 [] 2013-08-06 16:03:00 [rudd] Name: matches, dtype: object </code></pre> <p>What I want to end up with is a dataframe, grouped by day/month, that I can plot the different search terms as columns and then plot.</p> <p>I came across what looks like the perfect solution on another SO answer (<a href="https://stackoverflow.com/a/16637607/2034487">https://stackoverflow.com/a/16637607/2034487</a>) but when trying to apply to this series, I'm getting an exception:</p> <pre><code>In [2]: only_results.apply(lambda x: pd.Series(1,index=x)).fillna(0) Out [2]: Exception - Traceback (most recent call last) ... Exception: Reindexing only valid with uniquely valued Index objects </code></pre> <p>I really want to be able to apply the changes within the dataframe to apply and reapply groupby conditions and perform the plots efficiently - and would love to learn more about how the .apply() method works.</p> <p>Thanks in advance.</p> <p><strong>UPDATE AFTER SUCCESSFUL ANSWER</strong></p> <p>The issue was with duplicates in the "matches" column that I hadn't seen. I iterated through that column to remove duplicates and then used the original solution from @Jeff linked above. This was successful, and I can now .groupby() on the resultant series to see daily, hourly, etc, trends. Here's an example of the resultant plot:</p> <pre><code>In [3]: successful_run = only_results.apply(lambda x: pd.Series(1,index=x)).fillna(0) In [4]: successful_run.groupby([successful_run.index.day,successful_run.index.hour]).sum().plot() Out [4]: <matplotlib.axes.AxesSubplot at 0x110b51650> </code></pre> <p><img src="https://i.stack.imgur.com/wSP1w.png" alt="Plot grouped by day and hour"></p>

Querying!

Guidance

An individual column

Larger individual text columns get their own page to allow for proper reading.

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload