Skip to content

add initial pandas HDF fileformat #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
0.6 (unreleased)
----------------

- Add ability to compare to Pandas DataFrames and store them as HDF5 files [#23]

0.5 (2022-01-12)
----------------

Expand Down
4 changes: 3 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ in cases where the arrays are too large to conveniently hard-code them
in the tests (e.g. ``np.testing.assert_allclose(x, [1, 2, 3])``).

The basic idea is that you can write a test that generates a Numpy array (or
other related objects depending on the format). You can then either run the
other related objects depending on the format, e.g. pandas DataFrame).
You can then either run the
tests in a mode to **generate** reference files from the arrays, or you can run
the tests in **comparison** mode, which will compare the results of the tests to
the reference ones within some tolerance.
Expand All @@ -25,6 +26,7 @@ At the moment, the supported file formats for the reference files are:
- A plain text-based format (based on Numpy ``loadtxt`` output)
- The FITS format (requires `astropy <http://www.astropy.org>`__). With this
format, tests can return either a Numpy array for a FITS HDU object.
- A pandas HDF5 format using the pandas HDFStore

For more information on how to write tests to do this, see the **Using**
section below.
Expand Down
34 changes: 34 additions & 0 deletions pytest_arraydiff/plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,9 +137,43 @@ def write(filename, data, **kwargs):
return np.savetxt(filename, data, **kwargs)


class PDHDFDiff(BaseDiff):

extension = 'h5'

@staticmethod
def read(filename):
import pandas as pd
return pd.read_hdf(filename)

@staticmethod
def write(filename, data, **kwargs):
import pandas as pd
key = os.path.basename(filename).replace('.h5', '')
return data.to_hdf(filename, key, **kwargs)

@classmethod
def compare(cls, reference_file, test_file, atol=None, rtol=None):
import pandas.testing as pdt
import pandas as pd

ref_data = pd.read_hdf(reference_file)
test_data = pd.read_hdf(test_file)
try:
pdt.assert_frame_equal(ref_data, test_data)
except AssertionError as exc:
message = "\n\na: {0}".format(test_file) + '\n'
message += "b: {0}".format(reference_file) + '\n'
message += exc.args[0]
return False, message
else:
return True, ""


FORMATS = {}
FORMATS['fits'] = FITSDiff
FORMATS['text'] = TextDiff
FORMATS['pd_hdf'] = PDHDFDiff


def _download_file(url):
Expand Down
2 changes: 2 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ install_requires =
[options.extras_require]
test =
astropy
pandas
tables

[options.entry_points]
pytest11 =
Expand Down
Binary file added tests/baseline/test_succeeds_func_pdhdf.h5
Binary file not shown.
7 changes: 7 additions & 0 deletions tests/test_pytest_arraydiff.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ def test_succeeds_func_text():
return np.arange(3 * 5).reshape((3, 5))


@pytest.mark.array_compare(file_format='pd_hdf', reference_dir=reference_dir)
def test_succeeds_func_pdhdf():
pd = pytest.importorskip('pandas')
return pd.DataFrame(data=np.arange(20, dtype='int64'),
columns=['test_data'])


@pytest.mark.array_compare(file_format='fits', reference_dir=reference_dir)
def test_succeeds_func_fits():
return np.arange(3 * 5).reshape((3, 5)).astype(np.int64)
Expand Down