Note that there are some explanatory texts on larger screens.

plurals
  1. POHow can I efficiently save a python pandas dataframe in hdf5 and open it as a dataframe in R?
    primarykey
    data
    text
    <p>I think the title covers the issue, but to elucidate:</p> <p>The <a href="http://pandas.pydata.org" rel="noreferrer">pandas</a> python package has a DataFrame data type for holding table data in python. It also has a convenient interface to the <a href="http://www.hdfgroup.org/HDF5/" rel="noreferrer">hdf5</a> file format, so pandas DataFrames (and other data) can be saved using a simple dict-like interface (assuming you have <a href="http://pytables.org" rel="noreferrer">pytables</a> installed)</p> <pre><code>import pandas import numpy d = pandas.HDFStore('data.h5') d['testdata'] = pandas.DataFrame({'N': numpy.random.randn(5)}) d.close() </code></pre> <p>So far so good. However, if I then try to load that same hdf5 into R I see things aren't so simple:</p> <pre><code>&gt; library(hdf5) &gt; hdf5load('data.h5') NULL &gt; testdata $block0_values [,1] [,2] [,3] [,4] [,5] [1,] 1.498147 0.8843877 -1.081656 0.08717049 -1.302641 attr(,"CLASS") [1] "ARRAY" attr(,"VERSION") [1] "2.3" attr(,"TITLE") [1] "" attr(,"FLAVOR") [1] "numpy" $block0_items [1] "N" attr(,"CLASS") [1] "ARRAY" attr(,"VERSION") [1] "2.3" attr(,"TITLE") [1] "" attr(,"FLAVOR") [1] "numpy" attr(,"kind") [1] "string" attr(,"name") [1] "N." $axis1 [1] 0 1 2 3 4 attr(,"CLASS") [1] "ARRAY" attr(,"VERSION") [1] "2.3" attr(,"TITLE") [1] "" attr(,"FLAVOR") [1] "numpy" attr(,"kind") [1] "integer" attr(,"name") [1] "N." $axis0 [1] "N" attr(,"CLASS") [1] "ARRAY" attr(,"VERSION") [1] "2.3" attr(,"TITLE") [1] "" attr(,"FLAVOR") [1] "numpy" attr(,"kind") [1] "string" attr(,"name") [1] "N." attr(,"TITLE") [1] "" attr(,"CLASS") [1] "GROUP" attr(,"VERSION") [1] "1.0" attr(,"ndim") [1] 2 attr(,"axis0_variety") [1] "regular" attr(,"axis1_variety") [1] "regular" attr(,"nblocks") [1] 1 attr(,"block0_items_variety") [1] "regular" attr(,"pandas_type") [1] "frame" </code></pre> <p>Which brings me to my question: ideally I would be able to save back and forth from R to pandas. I can obviously write a wrapper from pandas to R (I think... though I think if I use a pandas <a href="http://pandas.sourceforge.net/indexing.html" rel="noreferrer">MultiIndex</a> that might become trickier), but I don't think I can easily then use that data back in pandas. Any suggestions?</p> <p>Bonus: what I <em>really</em> want to do is use the <a href="http://datatable.r-forge.r-project.org/" rel="noreferrer">data.table</a> package in R with a pandas dataframe (the keying approach is suspiciously similar in both packages). Any help on that one greatly appreciated.</p>
    singulars
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload