Skip to content

HDF5 files not compatible between python 2 and 3 ? #4260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hadim opened this issue Jul 16, 2013 · 10 comments
Closed

HDF5 files not compatible between python 2 and 3 ? #4260

hadim opened this issue Jul 16, 2013 · 10 comments

Comments

@hadim
Copy link

hadim commented Jul 16, 2013

I have a problem loading a h5 file from python 3 to python 2 and vice versa. I can't post the code which generate h5 file because is part of a big project. I'll try to upload h5 file directly later.

Here is the error when I load FROM python 3.3 an h5 file created BY python 2.7:

Traceback (most recent call last):
  File "/home/hadim/local/virtualenvs/st3/src/master/build/lib.linux-x86_64-3.3/pandas/core/index.py", line 1539, in _get_level_number
    level = self.names.index(level)
ValueError: 'side' is not in list

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    meta = SimuIO().read(results_file)
  File "../kt_simul/io/simuio.py", line 210, in read
    KD.spbL.traj = spbs.xs('A', level='side').values.T[0]
  File "/home/hadim/local/virtualenvs/st3/src/master/build/lib.linux-x86_64-3.3/pandas/core/frame.py", line 2335, in xs
    loc, new_ax = labels.get_loc_level(key, level=level)
  File "/home/hadim/local/virtualenvs/st3/src/master/build/lib.linux-x86_64-3.3/pandas/core/index.py", line 2322, in get_loc_level
    level = self._get_level_number(level)
  File "/home/hadim/local/virtualenvs/st3/src/master/build/lib.linux-x86_64-3.3/pandas/core/index.py", line 1542, in _get_level_number
    raise Exception('Level %s not found' % str(level))
Exception: Level side not found
Closing remaining open files: simu.h5... done 

Here is the error when I load FROM python 2.7 an h5 file created BY python 3.3:

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    meta = SimuIO().read(results_file)
  File "/home/hadim/.phd/dev/kt_simul/kt_simul/io/simuio.py", line 197, in read
    param_root = build_tree(store['params'])
  File "/home/hadim/local/virtualenvs/st/src/master/pandas/io/pytables.py", line 289, in __getitem__
    return self.get(key)
  File "/home/hadim/local/virtualenvs/st/src/master/pandas/io/pytables.py", line 422, in get
    return self._read_group(group)
  File "/home/hadim/local/virtualenvs/st/src/master/pandas/io/pytables.py", line 930, in _read_group
    return s.read(**kwargs)
  File "/home/hadim/local/virtualenvs/st/src/master/pandas/io/pytables.py", line 2194, in read
    values = self.read_array('block%d_values' % i)
  File "/home/hadim/local/virtualenvs/st/src/master/pandas/io/pytables.py", line 1776, in read_array
    data = node[:]
  File "/home/hadim/local/virtualenvs/st/local/lib/python2.7/site-packages/tables/vlarray.py", line 661, in __getitem__
    return self.read(start, stop, step)
  File "/home/hadim/local/virtualenvs/st/local/lib/python2.7/site-packages/tables/vlarray.py", line 801, in read
    outlistarr = [atom.fromarray(arr) for arr in listarr]
  File "/home/hadim/local/virtualenvs/st/local/lib/python2.7/site-packages/tables/atom.py", line 1151, in fromarray
    return cPickle.loads(array.tostring())
ValueError: unsupported pickle protocol: 3
Closing remaining open files: simu.h5... done

Version command run:

import sys
print(sys.version)
import numpy
print(numpy.__version__)
import tables
print(tables.__version__)
import pandas
print (pandas.__version__)

Python 3 shell:

3.3.1 (default, Apr 17 2013, 22:30:32) 
[GCC 4.7.3]
1.7.1
3.0.0
0.12.0.dev-404dfab

Python 2 shell:

2.7.4 (default, Apr  9 2013, 18:05:19) 
[GCC 4.7.3]
1.7.1
3.0.0
0.12.0.dev-4c2d050
@hadim
Copy link
Author

hadim commented Jul 16, 2013

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

try storing as tables rather than storer http://pandas.pydata.org/pandas-docs/dev/io.html#storer-format
(eg use append). cross version pickles generally don't work that well, and that's how pytables stores certain data types; tables stores everything as basic types (e.g. floats/unicode....)

here's an example of using 3.3 reading a 2.7 file (storing as a table)::

Python 3.3.0 (default, Feb  1 2013, 08:25:35) 

In [2]: pd.HDFStore('../../test.h5')
Out[2]: 
<class 'pandas.io.pytables.HDFStore'>
File path: ../../test.h5
/df            frame_table  (typ->appendable_multi,nrows->100000,ncols->5,indexers->[index],dc->[two,one])

@hadim
Copy link
Author

hadim commented Jul 16, 2013

I find out half the issue :-)

When I load an h5 file from python 3 created by python 2, string type problems can happen because python 3 and python 2 does not use the same string type. So h5 file keeps current string type.

By adding from __future__ import unicode_literals, python 3 can now load h5 file created by python 2 code.

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

hmmm good to know about py2.....let me see if I can reproduce that

fyi...here's py3 reading your py2 files....seems ok to me

In [8]: pd.HDFStore('../../simu_created_by_python_2.h5')
Out[8]: 
<class 'pandas.io.pytables.HDFStore'>
File path: ../../simu_created_by_python_2.h5
/kts                   frame        (shape->[1200,1])
/measures              frame        (shape->[1,7])   
/params                frame        (shape->[1,7])   
/plug_sites            frame        (shape->[3600,2])
/spbs                  frame        (shape->[400,1]) 

In [9]: pd.HDFStore('../../simu_created_by_python_3.h5')
Out[9]: 
<class 'pandas.io.pytables.HDFStore'>
File path: ../../simu_created_by_python_3.h5
/kts                   frame        (shape->[1200,1])
/measures              frame        (shape->[1,7])   
/params                frame        (shape->[1,7])   
/plug_sites            frame        (shape->[3600,2])
/spbs                  frame        (shape->[400,1]) 

In [10]: pd.HDFStore('../../simu_created_by_python_3.h5').select('kts')
Out[10]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1200 entries, (0, 0, kt, A) to (1990, 2, kt, B)
Data columns (total 1 columns):
x    1200  non-null values
dtypes: float64(1)

In [11]: pd.HDFStore('../../simu_created_by_python_2.h5').select('kts')
Out[11]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1200 entries, (0, 0, kt, A) to (1990, 2, kt, B)
Data columns (total 1 columns):
x    1200  non-null values
dtypes: float64(1)

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

reading in python 2.7.3 works ok as well

In [7]: pd.HDFStore('simu_created_by_python_3.h5').select('kts')
Out[7]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1200 entries, (0, 0, kt, A) to (1990, 2, kt, B)
Data columns (total 1 columns):
x    1200  non-null values
dtypes: float64(1)

In [8]: pd.HDFStore('simu_created_by_python_2.h5').select('kts')
Out[8]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1200 entries, (0, 0, kt, A) to (1990, 2, kt, B)
Data columns (total 1 columns):
x    1200  non-null values
dtypes: float64(1)

@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

fyi...your first error seems unrelated to reading anything

and here's my 2c....you said you are transitioning from 2 to 3...great.....I would just do a switch and not try to interoperate with the data files, this can be very tricky (as you can see)..

maybe you can structure such that writing is only done with say 2.7 for shared files, while reads can be done with both versions

@hadim
Copy link
Author

hadim commented Jul 16, 2013

Can you try:

pd.HDFStore('simu_created_by_python_3.h5')['params']

params contains unicode so it's again related to string :-)

@hadim
Copy link
Author

hadim commented Jul 16, 2013

Thank you for your advice. Anyway I will try to have something that can interoperate, it shouldn't be so difficult.

Now it does not seems to be pandas or pytables issues so thank you for your time and your help !

Bye

@hadim hadim closed this as completed Jul 16, 2013
@jreback
Copy link
Contributor

jreback commented Jul 16, 2013

as I said, try using a table, the problem is that certain of your dtypes are pickled; pickle from 3->2 doesn't work in my experience.

What you are attempting is non-trivial; maybe best to write out csv's during your transition period

@hadim
Copy link
Author

hadim commented Jul 16, 2013

Ok thank you. I think my code can survive if reading from py3 to py2 does not work :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants