Note that there are some explanatory texts on larger screens.

plurals
  1. POFunny results with pandas argsort
    primarykey
    data
    text
    <p>I think I have hit on a bug in pandas. I was hoping to get some help either verifying the bug or helping me figure out where my logic error is located in my code.</p> <p>My code is as follows:</p> <pre><code>import pandas, numpy, StringIO def sq_fixer(sr): sr = sr.where(sr != '20200229') ranks = sr.argsort().astype(float) ranks[ranks == -1] = numpy.nan return ','.join(ranks.astype(numpy.str)) def correct_date(sr): date_fixer = lambda x: pandas.datetime(x.year -100, x.month, x.day) if x &gt; pandas.datetime.now() else x sr = pandas.to_datetime(sr).apply(date_fixer).astype(pandas.datetime) return sr txt = '''ID,RUN_START_DATE,PUSHUP_START_DATE,SITUP_START_DATE,PULLUP_START_DATE 1,2013-01-24,2013-01-02,,2013-02-03 2,2013-01-30,2013-01-21,2013-01-13,2013-01-06 3,2013-01-29,2013-01-28,2013-01-01,2013-01-29 4,2013-02-16,2013-02-12,2013-01-04,2013-02-11 5,2013-01-06,2013-02-07,2013-02-25,2013-02-12 6,2013-01-26,2013-01-28,2013-02-12,2013-01-10 7,2013-01-26,,2013-01-12,2013-01-30 8,2013-01-03,2013-01-24,2013-01-19,2013-01-02 9,2013-01-22,2013-01-13,2013-02-03, 10,2013-02-06,2013-01-16,2013-02-07,2013-01-11 3347,,2008-02-27,2008-04-10,2008-02-13 3588,2004-09-12,,2004-11-06,2004-09-06 3784,2003-02-22,,2003-06-21,2003-02-19 593,2009-04-03,,2009-06-01,2009-04-01 4148,2003-03-21,2002-09-20,2003-04-01,2003-01-01 4299,2004-05-24,2004-07-23,,2004-04-22 4590,2005-05-05,2005-12-05,2005-04-05, 4830,2001-06-12,2000-10-12,2001-07-28,2001-01-28 4941,2006-11-08,2006-12-19,2006-07-19,2007-02-24 1416,2004-04-03,2004-05-19,2004-02-06, 1580,2008-12-20,,2009-03-19,2008-12-19 1661,2005-10-03,2005-10-26,2005-09-12,2006-02-19 1759,2001-10-18,,2002-01-17,2001-10-17 1858,2003-04-14,2003-05-17,,2002-12-17 1972,2003-06-01,2003-07-14,2002-12-14, 5905,2000-11-18,2001-01-13,,2000-11-04 2052,2002-06-11,,2002-08-23,2001-12-12 2165,2006-10-01,,2007-02-27,2006-09-30 2218,2007-09-19,,2008-02-06,2007-09-09 2350,2000-08-08,,2000-09-22,2000-01-08 2432,2001-08-22,,2001-09-25,2000-12-16 2611,2005-05-07,,2005-06-05,2005-03-26 2612,2005-05-06,,2005-05-26,2005-04-11 7378,2009-08-07,2009-01-30,2010-01-20,2009-06-08 7550,2006-04-08,,2006-06-01,2006-04-01 ''' df = pandas.read_csv(StringIO.StringIO(txt)) sequence_array = ['RUN_START_DATE', 'PUSHUP_START_DATE', 'SITUP_START_DATE', 'PULLUP_START_DATE'] xsequence_array = ['X_RUN_START_DATE', 'X_PUSHUP_START_DATE', 'X_SITUP_START_DATE', 'X_PULLUP_START_DATE'] df[sequence_array] = df[sequence_array].apply(correct_date, axis=1) fix_day = lambda x: x if x &gt; 0 else 29 fix_month = lambda x: x if x &gt; 0 else 02 fix_year = lambda x: x if x &gt; 0 else 2020 for col in sequence_array: xcol = 'X_{0}'.format(col) df[xcol] = ['{0:04d}{1:02d}{2:02d}'.format(fix_year(c.year), fix_month(c.month), fix_day(c.day)) for c in df[col]] df['X_AS_SEQUENCE'] = df[xsequence_array].apply(sq_fixer, axis=1) </code></pre> <p>When I run the code most of the results are correct. Take for example index 6:</p> <pre><code>In [31]: df.ix[6] Out[31]: ID 7 RUN_START_DATE 2013-01-26 00:00:00 PUSHUP_START_DATE NaN SITUP_START_DATE 2013-01-12 00:00:00 PULLUP_START_DATE 2013-01-30 00:00:00 X_RUN_START_DATE 20130126 X_PUSHUP_START_DATE 20200229 X_SITUP_START_DATE 20130112 X_PULLUP_START_DATE 20130130 X_AS_SEQUENCE 1.0,nan,0.0,2.0 </code></pre> <p>However, certain indices seem to throw pandas.argsort() for a loop. Take for example index 10:</p> <pre><code>In [32]: df.ix[10] Out[32]: ID 3347 RUN_START_DATE NaN PUSHUP_START_DATE 2008-02-27 00:00:00 SITUP_START_DATE 2008-04-10 00:00:00 PULLUP_START_DATE 2008-02-13 00:00:00 X_RUN_START_DATE 20200229 X_PUSHUP_START_DATE 20080227 X_SITUP_START_DATE 20080410 X_PULLUP_START_DATE 20080213 X_AS_SEQUENCE nan,2.0,0.0,1.0 </code></pre> <p>The argsort should return <code>nan,1.0,2.0,0.0</code> instead of <code>nan,2.0,0.0,1.0</code>.</p> <p>I have been on this for three days. At this point I am not sure if it is me or a bug. I am not sure how to backtrace it to get an answer. Any help would be most appreciated!</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload