-
-
Notifications
You must be signed in to change notification settings - Fork 19.5k
Description
I am trying to append chunks of data to an (initially empty) HDF5 frame with pd.HDFStore. The chunks come in out of order, and sometimes certain orders produce segfaults. This script seems to consistently segfault after loading file 31 (update, see comments for better example). You will notice that by looking at the time stamps outputted by the script it appears to be when hdf5 tries to fill some gap data. I can produce more files that trigger segfaults if necessary.
I've managed to narrow it down to the following line in pytables.py
> /home/user/env/lib/python3.4/site-packages/pandas/io/pytables.py(3738)write_data_chunk()
3737 self.table.append(rows)
-> 3738 self.table.flush()
3739 except Exception as detail:
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.16.1
nose: 1.3.3
Cython: 0.22
numpy: 1.9.2
scipy: None
statsmodels: None
IPython: 2.1.0
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.2.0
numexpr: 2.4
matplotlib: None
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: 0.9
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: 2.6 (dt dec pq3 ext lo64)
HDF5 Versions:
~/user$ dpkg -l | grep "hdf5"
ii hdf5-helpers 1.8.13+docs-15 amd64 Hierarchical Data Format 5 (HDF5) - Helper tools
ii hdf5-tools 1.8.13+docs-15 amd64 Hierarchical Data Format 5 (HDF5) - Runtime tools
rc libhdf5-7:amd64 1.8.12+docs-1 amd64 Hierarchical Data Format 5 (HDF5) - runtime files - serial version
ii libhdf5-8:amd64 1.8.13+docs-15 amd64 Hierarchical Data Format 5 (HDF5) - runtime files - serial version
rc libhdf5-cpp-7:amd64 1.8.12+docs-1 amd64 Hierarchical Data Format 5 (HDF5) - C++ libraries
ii libhdf5-cpp-8:amd64 1.8.13+docs-15 amd64 Hierarchical Data Format 5 (HDF5) - C++ libraries
ii libhdf5-dev 1.8.13+docs-15 amd64 Hierarchical Data Format 5 (HDF5) - development files - serial version
A script to reproduce:
import os
import pandas as pd
_dir = 'test_files'
_file = './hdf5_test.h5'
def write_to_file(series):
store = pd.HDFStore(_file, 'a')
frame = pd.DataFrame(series)
print("Appending data from {0} to {1}".format(
frame.index[0], frame.index[-1]))
store.append('test', frame)
store.close()
if __name__ == "__main__":
if os.path.isfile(_file):
os.remove(_file)
files = os.listdir(_dir)
for f in files:
series = pd.read_pickle(os.path.join(_dir, f))
print("Writing data: %s" % f)
write_to_file(series)
zipped pickled files for test: http://s000.tinyupload.com/?file_id=60238823358379433453
Here is a pastebin of the data where segfault is occuring from the example in csv format:
http://pastebin.com/FRsygCUG
note, you may need the actual files to reproduce this, but as you can see from the pastebin that the data isn't malformed
It is trying to fill the the following gap in the original data:
2011-01-04 17:55:00 to2011-01-05 22:15:00
with an append which results in a segfault
Script output:
Writing data: 00
Appending data from 2011-01-03 13:40:00 to 2011-01-04 17:55:00
Writing data: 01
Appending data from 2011-01-05 22:15:00 to 2011-01-07 02:30:00
Writing data: 02
Appending data from 2011-01-04 18:00:00 to 2011-01-05 22:15:00
Segmentation fault