Note that there are some explanatory texts on larger screens.

plurals
  1. POMost efficient way to convert an int64 series to datetime?
    primarykey
    data
    text
    <p>Setting the scene: I've got a Series object of dtype int64. I need to convert these to datetime object with just the date (without the hourses and secondses)</p> <p>What I've got so far to work with... </p> <p>foo.head() = </p> <pre><code>0 1382400000 1 1382400000 2 1382054400 3 1381708800 4 1380758400 Name: da_0, dtype: int64 </code></pre> <p>This function:</p> <pre><code>def convert_stamp_to_date(stamp): try: d = datetime.datetime.utcfromtimestamp(stamp) except: d = datetime.datetime.utcfromtimestamp(0) d = datetime.datetime(d.year, d.month, d.day) return d </code></pre> <p>When I'm processing the Series in question, I'll call:</p> <pre><code>foo = foo.apply(lambda x: convert_stamp_to_date(x)) </code></pre> <p>which gives me the right solution:</p> <pre><code>0 2013-10-22 00:00:00 1 2013-10-22 00:00:00 2 2013-10-18 00:00:00 3 2013-10-14 00:00:00 4 2013-10-03 00:00:00 Name: da_0, dtype: datetime64[ns] </code></pre> <p>This gives me what I want, however I find it pretty slow (as it should be, right? since its just the naive way of doing the job).</p> <p>For a small Series object of length ~5000, it takes on average ~27ms to do the conversion. Not <strong><em>bad</em></strong>... however, I can easily have Series objects which grow to millions of rows. And for those, I see conversion times going into the 1-2 minute range. And compared to other things that I do with Series and DataFrames of the same size, this seems to be way too slow. </p> <p>My first idea was to try to pseudo-vectorize the function using <code>np.vectorize</code>. However, this actually makes the conversion about 10 times <strong>slower</strong>. </p> <pre><code>vconvert_stamp_to_date = np.vectorize(convert_stamp_to_date) foo = foo.apply(lambda x: vconvert_stamp_to_date(x)) </code></pre> <p>While this still gives me the right answer, it bumps up the conversion time for the smaller Series objects to about 350ms, and for the larger Series that I work with, I had to ctrl+c out of the script because it was taking too long. </p> <p>It would seem a bit ridiculous to me that converting a timestamp to a datetime object would be the bottleneck of my program :( I have to believe that there's a more efficient way to do this somewhere. Can anyone please point me in the right direction? For the moment, I've exhausted all my pandas mana. If you've read all the way down here, I am very grateful. </p> <p>Thank you.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload