Skip to content

Segfault when writing data out of order to pd.HDFStore via append #10180

@bastianlb

Description

@bastianlb

I am trying to append chunks of data to an (initially empty) HDF5 frame with pd.HDFStore. The chunks come in out of order, and sometimes certain orders produce segfaults. This script seems to consistently segfault after loading file 31 (update, see comments for better example). You will notice that by looking at the time stamps outputted by the script it appears to be when hdf5 tries to fill some gap data. I can produce more files that trigger segfaults if necessary.

I've managed to narrow it down to the following line in pytables.py

> /home/user/env/lib/python3.4/site-packages/pandas/io/pytables.py(3738)write_data_chunk()
   3737                 self.table.append(rows)
-> 3738                 self.table.flush()
   3739         except Exception as detail:
INSTALLED VERSIONS
------------------
commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.1
nose: 1.3.3
Cython: 0.22
numpy: 1.9.2
scipy: None
statsmodels: None
IPython: 2.1.0
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.2.0
numexpr: 2.4
matplotlib: None
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: 0.9
apiclient: None
sqlalchemy: 0.9.8
pymysql: None
psycopg2: 2.6 (dt dec pq3 ext lo64)

HDF5 Versions:
~/user$ dpkg -l | grep "hdf5"
ii  hdf5-helpers                   1.8.13+docs-15              amd64        Hierarchical Data Format 5 (HDF5) - Helper tools
ii  hdf5-tools                     1.8.13+docs-15              amd64        Hierarchical Data Format 5 (HDF5) - Runtime tools
rc  libhdf5-7:amd64                1.8.12+docs-1               amd64        Hierarchical Data Format 5 (HDF5) - runtime files - serial version
ii  libhdf5-8:amd64                1.8.13+docs-15              amd64        Hierarchical Data Format 5 (HDF5) - runtime files - serial version
rc  libhdf5-cpp-7:amd64            1.8.12+docs-1               amd64        Hierarchical Data Format 5 (HDF5) - C++ libraries
ii  libhdf5-cpp-8:amd64            1.8.13+docs-15              amd64        Hierarchical Data Format 5 (HDF5) - C++ libraries
ii  libhdf5-dev                    1.8.13+docs-15              amd64        Hierarchical Data Format 5 (HDF5) - development files - serial version

A script to reproduce:

import os
import pandas as pd

_dir = 'test_files'
_file = './hdf5_test.h5'


def write_to_file(series):
    store = pd.HDFStore(_file, 'a')
    frame = pd.DataFrame(series)
    print("Appending data from {0} to {1}".format(
        frame.index[0], frame.index[-1]))
    store.append('test', frame)
    store.close()


if __name__ == "__main__":
    if os.path.isfile(_file):
        os.remove(_file)
    files = os.listdir(_dir)
    for f in files:
        series = pd.read_pickle(os.path.join(_dir, f))
        print("Writing data: %s" % f)
        write_to_file(series)

zipped pickled files for test: http://s000.tinyupload.com/?file_id=60238823358379433453

Here is a pastebin of the data where segfault is occuring from the example in csv format:
http://pastebin.com/FRsygCUG
note, you may need the actual files to reproduce this, but as you can see from the pastebin that the data isn't malformed

It is trying to fill the the following gap in the original data:
2011-01-04 17:55:00 to2011-01-05 22:15:00
with an append which results in a segfault

Script output:

Writing data: 00
Appending data from 2011-01-03 13:40:00 to 2011-01-04 17:55:00
Writing data: 01
Appending data from 2011-01-05 22:15:00 to 2011-01-07 02:30:00
Writing data: 02
Appending data from 2011-01-04 18:00:00 to 2011-01-05 22:15:00
Segmentation fault

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO HDF5read_hdf, HDFStore

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions