StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POSelecting a subset of a Pandas DataFrame indexed by DatetimeIndex with a list of TimeStamps
primarykey
Id
11991627
data
AcceptedAnswerId
11994944
AnswerCount
1
ClosedDate
CommentCount
0
CommunityOwnedDate
CreationDate
2012-08-16T16:27:44.560
FavoriteCount
4
LastActivityDate
2017-01-05T00:35:23.883
LastEditDate
2017-01-05T00:35:23.883
LastEditorUserId
2336654
OwnerUserId
1135883
ParentId
0
PostTypeId
1
Score
11
ViewCount
34955
LastEditorDisplayName
text
Body
I have a large Pandas <code>DataFrame</code> <pre><code><class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3425100 entries, 2011-12-01 00:00:00 to 2011-12-31 23:59:59 Data columns: sig_qual 3425100 non-null values heave 3425100 non-null values north 3425099 non-null values west 3425097 non-null values dtypes: float64(4) </code></pre> I select a subset of that <code>DataFrame</code> using <code>.ix[start_datetime:end_datetime]</code> and I pass this to a <a href="https://gist.github.com/1178136" rel="nofollow noreferrer">peakdetect function</a> which returns the index and value of the local maxima and minima in two seperate lists. I extract the index position of the maxima and using <code>DataFrame.index</code> I get a list of pandas TimeStamps. I then attempt to extract the relevant subset of the large DataFrame by passing the list of TimeStamps to <code>.ix[]</code> but it always seems to return an empty <code>DataFrame</code>. I can loop over the list of TimeStamps and get the relevant rows from the <code>DataFrame</code> but this is a lengthy process and I thought that <code>ix[]</code> should accept a list of values according to <a href="http://pandas.sourceforge.net/indexing.html" rel="nofollow noreferrer">the docs</a>? (Although I see that the example for Pandas 0.7 uses a <code>numpy.ndarray</code> of <code>numpy.datetime64</code>) Update: A small 8 second subset of the DataFrame is selected below, # lines show some of the values: <pre><code>y = raw_disp['heave'].ix[datetime(2011,12,30,0,0,0):datetime(2011,12,30,0,0,8)] #csv representation of y time-series 2011-12-30 00:00:00,-310.0 2011-12-30 00:00:01,-238.0 2011-12-30 00:00:01.500000,-114.0 2011-12-30 00:00:02.500000,60.0 2011-12-30 00:00:03,185.0 2011-12-30 00:00:04,259.0 2011-12-30 00:00:04.500000,231.0 2011-12-30 00:00:05.500000,139.0 2011-12-30 00:00:06.500000,55.0 2011-12-30 00:00:07,-49.0 2011-12-30 00:00:08,-144.0 index = y.index <class 'pandas.tseries.index.DatetimeIndex'> [2011-12-30 00:00:00, ..., 2011-12-30 00:00:08] Length: 11, Freq: None, Timezone: None #_max returned from the peakdetect function, one local maxima for this 8 seconds period _max = [[5, 259.0]] indexes = [x[0] for x in _max] #[5] timestamps = [index[z] for z in indexes] #[<Timestamp: 2011-12-30 00:00:04>] print raw_disp.ix[timestamps] #Empty DataFrame #Columns: array([sig_qual, heave, north, west, extrema], dtype=object) #Index: <class 'pandas.tseries.index.DatetimeIndex'> #Length: 0, Freq: None, Timezone: None for timestamp in timestamps: print raw_disp.ix[timestamp] #sig_qual 0 #heave 259 #north 27 #west 132 #extrema 0 #Name: 2011-12-30 00:00:04 </code></pre> Update 2: I <a href="https://gist.github.com/3377772" rel="nofollow noreferrer">created a gist</a>, which actually works because when the data is loaded in from csv the index columns of timestamps are stored as numpy array of objects which appear to be strings. Unlike in my own code where the index is of type <code><class 'pandas.tseries.index.DatetimeIndex'></code> and each element is of type <code><class 'pandas.lib.Timestamp'></code>, I thought passing a list of <code>pandas.lib.Timestamp</code> would work the same as passing individual timestamps, would this be considered a bug? If I create the original <code>DataFrame</code> with the index as a list of strings, querying with a list of strings works fine. It does increase the byte size of the DataFrame significantly though. Update 3: The error only appears to occur with very large DataFrames, I reran the code on varying sizes of DataFrame ( some detail in a comment below ) and it appears to occur on a DataFrame above 2.7 million records. Using strings as opposed to TimeStamps resolves the issue but increases memory usage. Fixed In latest github master (18/09/2012), see comment from Wes at bottom of page.
Tags
<python><time-series><pandas>
Title
Selecting a subset of a Pandas DataFrame indexed by DatetimeIndex with a list of TimeStamps
singulars
PostAcceptedAnswerId
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USpiRSquared
UserOwnerUserId
1. USseumas
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. PL
 singulars
 LinkTypeLinkTypeId
 LTLinked
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. PO
 singulars
 PostTypePostTypeId
 PTAnswer
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POSelecting a subset of a Pandas DataFrame indexed by DatetimeIndex with a list of TimeStamps
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
2. VO
 singulars
 PostPostId
 POSelecting a subset of a Pandas DataFrame indexed by DatetimeIndex with a list of TimeStamps
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTApproveEditSuggestion
3. VO
 singulars
 PostPostId
 POSelecting a subset of a Pandas DataFrame indexed by DatetimeIndex with a list of TimeStamps
 UserUserId
 USMarkus W
 VoteTypeVoteTypeId
 VTFavorite
CommentsPostId
1. This table or related slice is empty.

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.