ENH/BUG/DOC: HDFStore.append() on axis other than DataFrame.index or Panel.major_axis #8392

seth-p · 2014-09-25T18:08:51Z

For DataFrames, HDFStore.append() works only when "appending" along the index direction, i.e. it expects the columns to be the same. See sample below. This doesn't appear to be documented. If can only append along a single axis, would be nice to be able to specify which.

Similarly for Panels, it appears to work only along the major_axis. I haven't checked with Panel4D.

Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 2.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pandas import DataFrame, HDFStore

In [2]: df1 = DataFrame(1., index=range(2), columns=['A','B'])

In [3]: df2 = DataFrame(2., index=range(2,4), columns=['A','B'])

In [4]: store = HDFStore('bar.h5')

In [5]: store.append('df', df1)

In [6]: store.append('df', df2)

In [7]: store.append('df_transpose', df1.transpose())

In [8]: store.append('df_transpose', df2.transpose())
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-1d42b87002f5> in <module>()
----> 1 store.append('df_transpose', df2.transpose())

C:\Python34\lib\site-packages\pandas\io\pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
    909         kwargs = self._validate_format(format, kwargs)
    910         self._write_to_group(key, value, append=append, dropna=dropna,
--> 911                              **kwargs)
    912
    913     def append_to_multiple(self, d, value, selector, data_columns=None,

C:\Python34\lib\site-packages\pandas\io\pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1268
   1269         # write the object
-> 1270         s.write(obj=value, append=append, complib=complib, **kwargs)
   1271
   1272         if s.is_table and index:

C:\Python34\lib\site-packages\pandas\io\pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3603         self.create_axes(axes=axes, obj=obj, validate=append,
   3604                          min_itemsize=min_itemsize,
-> 3605                          **kwargs)
   3606
   3607         if not self.is_exists:

C:\Python34\lib\site-packages\pandas\io\pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3246                         "cannot match existing table structure for [%s] on "
   3247                         "appending data" % ','.join(com.pprint_thing(item) for
-> 3248                                                     item in items))
   3249             blocks = new_blocks
   3250             blk_items = new_blk_items

ValueError: cannot match existing table structure for [0,1] on appending data

In [9]: from pandas import show_versions

In [10]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.1
nose: 1.3.4
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None

The text was updated successfully, but these errors were encountered:

jreback · 2014-09-25T22:20:44Z

this is by definition
the stores are row oriented (because PyTables is)
a bit of explanation exists for Panel4D (and Panel)
you can specify the axes parameter to control orientation (it's in the doc string)

if u would like to add a small doc section would be ok

jreback · 2014-09-25T22:22:24Z

http://pandas-docs.github.io/pandas-docs-travis/io.html#experimental

seth-p · 2014-09-26T18:31:33Z

I think the documentation for {DataFrame,Panel,Panel4D}.append() should indicate along which axis it can append, and any restrictions on the other axes. (Do the other axes have to be identical to those of the stored object? a subset? I'm not sure.)

jreback · 2014-09-26T18:33:27Z

@seth-p it will raise if the non-index axes are not identical. Yes, I suppose documentation could be improved in that regards though. (e.g. a small section on the axes parameter`` would be good).

seth-p · 2014-09-26T18:50:43Z

So if I want to add a column to an existing DataFrame in an HDF5 store, my only option is to load the entire stored DataFrame into memory, add the column in memory, and then re-put the whole DataFrame anew into the store? (Am not being critical. Just want to make sure I understand.)

jreback · 2014-09-26T18:54:21Z

their are 3 options:

as you describe
make a new table indexed like the old one and use the select_as_multiple when you want to retrieve. taken to extremese this can form a column-store looking table (which is very efficient at deleting/adding columns, but row ops become less efficient)
use bcolz and actually create a column store (see ENH: allow column oriented table storage in HDFStore #4454 ) for more commentary

ultimately the user should be able to decide which form of store that they need/want (and potentially migrate between them). But this is a much bigger issue / more complicated. See https://github.com/ContinuumIO/blaze for the real answer

jreback added Enhancement IO HDF5 read_hdf, HDFStore labels Sep 26, 2014

jreback added this to the Someday milestone Sep 26, 2014

mroeschke removed this from the Someday milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH/BUG/DOC: HDFStore.append() on axis other than DataFrame.index or Panel.major_axis #8392

ENH/BUG/DOC: HDFStore.append() on axis other than DataFrame.index or Panel.major_axis #8392

seth-p commented Sep 25, 2014

jreback commented Sep 25, 2014

Uh oh!

jreback commented Sep 25, 2014

Uh oh!

seth-p commented Sep 26, 2014

Uh oh!

jreback commented Sep 26, 2014

Uh oh!

seth-p commented Sep 26, 2014

Uh oh!

jreback commented Sep 26, 2014

Uh oh!

Uh oh!

ENH/BUG/DOC: HDFStore.append() on axis other than DataFrame.index or Panel.major_axis #8392

ENH/BUG/DOC: HDFStore.append() on axis other than DataFrame.index or Panel.major_axis #8392

Comments

seth-p commented Sep 25, 2014

jreback commented Sep 25, 2014

Uh oh!

jreback commented Sep 25, 2014

Uh oh!

seth-p commented Sep 26, 2014

Uh oh!

jreback commented Sep 26, 2014

Uh oh!

seth-p commented Sep 26, 2014

Uh oh!

jreback commented Sep 26, 2014

Uh oh!