Skip to content

Commit 5552817

Browse files
committed
Support for reading SAS7BDAT files
Working version except for compression reorganized directory structure Added license file for Jared Hobbs code RLE decompression use ndarray instead of bytes RDC decompression Fix byte order swapping fix rebase errors in test_xport Use filepath_or_buffer io function Handle alilgnment correction Revamped testing Add test with unicode strings Add minimal encoding detection Refactor row-processing Add missing test file Unclobber test files Try again to revert accidental changes to test data files Minor changes in response to code review Add SAS benchmarks to ASV Stash changes before rebase refactor following code review Updated io and whatsnew Updates following code review Remove local test modifications Minor changes following code review Remove unwanted test data file Mostly formatting changes following code review Remove two unneeded files Add __init__py
1 parent 49f99a6 commit 5552817

39 files changed

+1325
-89
lines changed

LICENSES/SAS7BDAT_LICENSE

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Copyright (c) 2015 Jared Hobbs
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy of
4+
this software and associated documentation files (the "Software"), to deal in
5+
the Software without restriction, including without limitation the rights to
6+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7+
of the Software, and to permit persons to whom the Software is furnished to do
8+
so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in all
11+
copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19+
SOFTWARE.

asv_bench/benchmarks/packers.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,24 @@ def remove(self, f):
318318
pass
319319

320320

321+
class packers_read_sas7bdat(object):
322+
323+
def setup(self):
324+
self.f = 'data/test1.sas7bdat'
325+
326+
def time_packers_read_sas7bdat(self):
327+
pd.read_sas(self.f, format='sas7bdat')
328+
329+
330+
class packers_read_xport(object):
331+
332+
def setup(self):
333+
self.f = 'data/paxraw_d_short.xpt'
334+
335+
def time_packers_read_xport(self):
336+
pd.read_sas(self.f, format='xport')
337+
338+
321339
class packers_write_csv(object):
322340
goal_time = 0.2
323341

@@ -854,4 +872,4 @@ def remove(self, f):
854872
try:
855873
os.remove(self.f)
856874
except:
857-
pass
875+
pass

doc/source/io.rst

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4564,24 +4564,25 @@ easy conversion to and from pandas.
45644564

45654565
.. _io.sas_reader:
45664566

4567-
SAS Format
4568-
----------
4567+
SAS Formats
4568+
-----------
45694569

45704570
.. versionadded:: 0.17.0
45714571

4572-
The top-level function :func:`read_sas` currently can read (but
4573-
not write) SAS xport (.XPT) format files. Pandas cannot currently
4574-
handle SAS7BDAT files.
4572+
The top-level function :func:`read_sas` can read (but not write) SAS
4573+
`xport` (.XPT) and `SAS7BDAT` (.sas7bdat) format files.
45754574

4576-
XPORT files only contain two value types: ASCII text and double
4577-
precision numeric values. There is no automatic type conversion to
4578-
integers, dates, or categoricals. By default the whole file is read
4579-
and returned as a ``DataFrame``.
4575+
SAS files only contain two value types: ASCII text and floating point
4576+
values (usually 8 bytes but sometimes truncated). For xport files,
4577+
there is no automatic type conversion to integers, dates, or
4578+
categoricals. For SAS7BDAT files, the format codes may allow date
4579+
variables to be automatically converted to dates. By default the
4580+
whole file is read and returned as a ``DataFrame``.
45804581

4581-
Specify a ``chunksize`` or use ``iterator=True`` to obtain an
4582-
``XportReader`` object for incrementally reading the file. The
4583-
``XportReader`` object also has attributes that contain additional
4584-
information about the file and its variables.
4582+
Specify a ``chunksize`` or use ``iterator=True`` to obtain reader
4583+
objects (``XportReader`` or ``SAS7BDATReader``) for incrementally
4584+
reading the file. The reader objects also have attributes that
4585+
contain additional information about the file and its variables.
45854586

45864587
Read a SAS XPORT file:
45874588

@@ -4602,6 +4603,8 @@ web site.
46024603

46034604
.. _specification: https://support.sas.com/techsup/technote/ts140.pdf
46044605

4606+
No official documentation is available for the SAS7BDAT format.
4607+
46054608
.. _io.perf:
46064609

46074610
Performance Considerations

doc/source/whatsnew/v0.18.0.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -403,6 +403,13 @@ For example, if you have a jupyter notebook you plan to convert to latex using n
403403
Options ``display.latex.escape`` and ``display.latex.longtable`` have also been added to the configuration and are used automatically by the ``to_latex``
404404
method. See the :ref:`options documentation<options>` for more info.
405405

406+
SAS7BDAT files
407+
^^^^^^^^^^^^^^
408+
409+
Pandas can now read SAS7BDAT files, including compressed files. The
410+
files can be read in entirety, or incrementally. For full details see
411+
:ref:`here <io.sas>`. (issue:`4052`)
412+
406413
.. _whatsnew_0180.enhancements.other:
407414

408415
Other enhancements

pandas/io/api.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from pandas.io.json import read_json
1212
from pandas.io.html import read_html
1313
from pandas.io.sql import read_sql, read_sql_table, read_sql_query
14-
from pandas.io.sas import read_sas
14+
from pandas.io.sas.sasreader import read_sas
1515
from pandas.io.stata import read_stata
1616
from pandas.io.pickle import read_pickle, to_pickle
1717
from pandas.io.packers import read_msgpack, to_msgpack

pandas/io/sas/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)