From 57e2d9919d46bf29ce7776e533e98943091a4672 Mon Sep 17 00:00:00 2001 From: Kerby Shedden Date: Fri, 6 Mar 2015 07:33:00 -0500 Subject: [PATCH 1/2] Fix several stata doc issues --- doc/source/io.rst | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/doc/source/io.rst b/doc/source/io.rst index 1b88a5ba3ba98..28b53c2cdeddb 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -3779,15 +3779,15 @@ into a .dta file. The format version of this file is always 115 (Stata 12). df = DataFrame(randn(10, 2), columns=list('AB')) df.to_stata('stata.dta') -*Stata* data files have limited data type support; only strings with 244 or -fewer characters, ``int8``, ``int16``, ``int32``, ``float32` and ``float64`` -can be stored -in ``.dta`` files. Additionally, *Stata* reserves certain values to represent -missing data. Exporting a non-missing value that is outside of the -permitted range in Stata for a particular data type will retype the variable -to the next larger size. For example, ``int8`` values are restricted to lie -between -127 and 100 in Stata, and so variables with values above 100 will -trigger a conversion to ``int16``. ``nan`` values in floating points data +*Stata* data files have limited data type support; only strings with +244 or fewer characters, ``int8``, ``int16``, ``int32``, ``float32`` +and ``float64`` can be stored in ``.dta`` files. Additionally, +*Stata* reserves certain values to represent missing data. Exporting a +non-missing value that is outside of the permitted range in Stata for +a particular data type will retype the variable to the next larger +size. For example, ``int8`` values are restricted to lie between -127 +and 100 in Stata, and so variables with values above 100 will trigger +a conversion to ``int16``. ``nan`` values in floating points data types are stored as the basic missing data type (``.`` in *Stata*). .. note:: @@ -3810,7 +3810,7 @@ outside of this range, the variable is cast to ``int16``. .. warning:: - :class:`~pandas.io.stata.StataWriter`` and + :class:`~pandas.io.stata.StataWriter` and :func:`~pandas.core.frame.DataFrame.to_stata` only support fixed width strings containing up to 244 characters, a limitation imposed by the version 115 dta file format. Attempting to write *Stata* dta files with strings @@ -3836,9 +3836,10 @@ Specifying a ``chunksize`` yields a read ``chunksize`` lines from the file at a time. The ``StataReader`` object can be used as an iterator. - reader = pd.read_stata('stata.dta', chunksize=1000) - for df in reader: - do_something(df) +.. ipython:: python + reader = pd.read_stata('stata.dta', chunksize=3) + for df in reader: + print(df.shape) For more fine-grained control, use ``iterator=True`` and specify ``chunksize`` with each call to @@ -3847,8 +3848,8 @@ For more fine-grained control, use ``iterator=True`` and specify .. ipython:: python reader = pd.read_stata('stata.dta', iterator=True) - chunk1 = reader.read(10) - chunk2 = reader.read(20) + chunk1 = reader.read(5) + chunk2 = reader.read(5) Currently the ``index`` is retrieved as a column. @@ -3869,7 +3870,7 @@ formats 104, 105, 108, 113-115 (Stata 10-12) and 117 (Stata 13+). .. note:: Setting ``preserve_dtypes=False`` will upcast to the standard pandas data types: - ``int64`` for all integer types and ``float64`` for floating poitn data. By default, + ``int64`` for all integer types and ``float64`` for floating point data. By default, the Stata data types are preserved when importing. .. ipython:: python From 08297e62945626c25413257d5a38c366d047aea6 Mon Sep 17 00:00:00 2001 From: Kerby Shedden Date: Fri, 6 Mar 2015 07:59:45 -0500 Subject: [PATCH 2/2] Further minor doc fixes --- doc/source/io.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/source/io.rst b/doc/source/io.rst index 28b53c2cdeddb..d49e88c953b27 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -3837,6 +3837,7 @@ read ``chunksize`` lines from the file at a time. The ``StataReader`` object can be used as an iterator. .. ipython:: python + reader = pd.read_stata('stata.dta', chunksize=3) for df in reader: print(df.shape) @@ -3862,7 +3863,7 @@ The parameter ``convert_missing`` indicates whether missing value representations in Stata should be preserved. If ``False`` (the default), missing values are represented as ``np.nan``. If ``True``, missing values are represented using ``StataMissingValue`` objects, and columns containing missing -values will have ```object`` data type. +values will have ``object`` data type. :func:`~pandas.read_stata` and :class:`~pandas.io.stata.StataReader` supports .dta formats 104, 105, 108, 113-115 (Stata 10-12) and 117 (Stata 13+).