Skip to content

ENH/BUG/DOC: HDFStore.append() on axis other than DataFrame.index or Panel.major_axis #8392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
seth-p opened this issue Sep 25, 2014 · 6 comments
Labels
Enhancement IO HDF5 read_hdf, HDFStore

Comments

@seth-p
Copy link
Contributor

seth-p commented Sep 25, 2014

For DataFrames, HDFStore.append() works only when "appending" along the index direction, i.e. it expects the columns to be the same. See sample below. This doesn't appear to be documented. If can only append along a single axis, would be nice to be able to specify which.

Similarly for Panels, it appears to work only along the major_axis. I haven't checked with Panel4D.

Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:45:13) [MSC v.1600 64 bit (AMD64)]
Type "copyright", "credits" or "license" for more information.

IPython 2.2.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from pandas import DataFrame, HDFStore

In [2]: df1 = DataFrame(1., index=range(2), columns=['A','B'])

In [3]: df2 = DataFrame(2., index=range(2,4), columns=['A','B'])

In [4]: store = HDFStore('bar.h5')

In [5]: store.append('df', df1)

In [6]: store.append('df', df2)

In [7]: store.append('df_transpose', df1.transpose())

In [8]: store.append('df_transpose', df2.transpose())
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-1d42b87002f5> in <module>()
----> 1 store.append('df_transpose', df2.transpose())

C:\Python34\lib\site-packages\pandas\io\pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
    909         kwargs = self._validate_format(format, kwargs)
    910         self._write_to_group(key, value, append=append, dropna=dropna,
--> 911                              **kwargs)
    912
    913     def append_to_multiple(self, d, value, selector, data_columns=None,

C:\Python34\lib\site-packages\pandas\io\pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1268
   1269         # write the object
-> 1270         s.write(obj=value, append=append, complib=complib, **kwargs)
   1271
   1272         if s.is_table and index:

C:\Python34\lib\site-packages\pandas\io\pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3603         self.create_axes(axes=axes, obj=obj, validate=append,
   3604                          min_itemsize=min_itemsize,
-> 3605                          **kwargs)
   3606
   3607         if not self.is_exists:

C:\Python34\lib\site-packages\pandas\io\pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3246                         "cannot match existing table structure for [%s] on "
   3247                         "appending data" % ','.join(com.pprint_thing(item) for
-> 3248                                                     item in items))
   3249             blocks = new_blocks
   3250             blk_items = new_blk_items

ValueError: cannot match existing table structure for [0,1] on appending data

In [9]: from pandas import show_versions

In [10]: show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.1.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.14.1
nose: 1.3.4
Cython: 0.21
numpy: 1.9.0
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.2.0
sphinx: 1.2.3
patsy: 0.3.0
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.7
bottleneck: 0.8.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.7
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Sep 25, 2014

this is by definition
the stores are row oriented (because PyTables is)
a bit of explanation exists for Panel4D (and Panel)
you can specify the axes parameter to control orientation (it's in the doc string)

if u would like to add a small doc section would be ok

@jreback
Copy link
Contributor

jreback commented Sep 25, 2014

@jreback jreback added Enhancement IO HDF5 read_hdf, HDFStore labels Sep 26, 2014
@jreback jreback added this to the Someday milestone Sep 26, 2014
@seth-p
Copy link
Contributor Author

seth-p commented Sep 26, 2014

I think the documentation for {DataFrame,Panel,Panel4D}.append() should indicate along which axis it can append, and any restrictions on the other axes. (Do the other axes have to be identical to those of the stored object? a subset? I'm not sure.)

@jreback
Copy link
Contributor

jreback commented Sep 26, 2014

@seth-p it will raise if the non-index axes are not identical. Yes, I suppose documentation could be improved in that regards though. (e.g. a small section on the axes parameter`` would be good).

@seth-p
Copy link
Contributor Author

seth-p commented Sep 26, 2014

So if I want to add a column to an existing DataFrame in an HDF5 store, my only option is to load the entire stored DataFrame into memory, add the column in memory, and then re-put the whole DataFrame anew into the store? (Am not being critical. Just want to make sure I understand.)

@jreback
Copy link
Contributor

jreback commented Sep 26, 2014

their are 3 options:

  1. as you describe
  2. make a new table indexed like the old one and use the select_as_multiple when you want to retrieve. taken to extremese this can form a column-store looking table (which is very efficient at deleting/adding columns, but row ops become less efficient)
  3. use bcolz and actually create a column store (see ENH: allow column oriented table storage in HDFStore #4454 ) for more commentary

ultimately the user should be able to decide which form of store that they need/want (and potentially migrate between them). But this is a much bigger issue / more complicated. See https://github.com/ContinuumIO/blaze for the real answer

@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

3 participants