Skip to content

Commit 53e61c6

Browse files
committed
Merge pull request #5601 from jreback/na_values
TST/API: test the list of NA values in the csv parser. add N/A, #NA as independent default values (GH5521)
2 parents d057fc9 + 3989060 commit 53e61c6

File tree

4 files changed

+30
-3
lines changed

4 files changed

+30
-3
lines changed

doc/source/io.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@ the corresponding equivalent values will also imply a missing value (in this cas
564564
``[5.0,5]`` are recognized as ``NaN``.
565565

566566
To completely override the default values that are recognized as missing, specify ``keep_default_na=False``.
567-
The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', 'NA',
567+
The default ``NaN`` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA',
568568
'#NA', 'NULL', 'NaN', 'nan']``.
569569

570570
.. code-block:: python

doc/source/release.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ Improvements to existing features
126126
(:issue:`4039`) with improved validation for all (:issue:`4039`,
127127
:issue:`4794`)
128128
- A Series of dtype ``timedelta64[ns]`` can now be divided/multiplied
129-
by an integer series (:issue`4521`)
129+
by an integer series (:issue:`4521`)
130130
- A Series of dtype ``timedelta64[ns]`` can now be divided by another
131131
``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
132132
is frequency conversion; astyping is also supported.
@@ -410,6 +410,8 @@ API Changes
410410
411411
- raise/warn ``SettingWithCopyError/Warning`` exception/warning when setting of a
412412
copy thru chained assignment is detected, settable via option ``mode.chained_assignment``
413+
- test the list of ``NA`` values in the csv parser. add ``N/A``, ``#NA`` as independent default
414+
na values (:issue:`5521`)
413415

414416
Internal Refactoring
415417
~~~~~~~~~~~~~~~~~~~~

pandas/io/parsers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -438,7 +438,7 @@ def read_fwf(filepath_or_buffer, colspecs='infer', widths=None, **kwds):
438438
# no longer excluding inf representations
439439
# '1.#INF','-1.#INF', '1.#INF000000',
440440
_NA_VALUES = set(['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN',
441-
'#N/A N/A', 'NA', '#NA', 'NULL', 'NaN',
441+
'#N/A','N/A', 'NA', '#NA', 'NULL', 'NaN',
442442
'nan', ''])
443443

444444

pandas/io/tests/test_parsers.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -683,6 +683,31 @@ def test_non_string_na_values(self):
683683
tm.assert_frame_equal(result6,good_compare)
684684
tm.assert_frame_equal(result7,good_compare)
685685

686+
def test_default_na_values(self):
687+
_NA_VALUES = set(['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN',
688+
'#N/A','N/A', 'NA', '#NA', 'NULL', 'NaN',
689+
'nan', ''])
690+
691+
nv = len(_NA_VALUES)
692+
def f(i, v):
693+
if i == 0:
694+
buf = ''
695+
elif i > 0:
696+
buf = ''.join([','] * i)
697+
698+
buf = "{0}{1}".format(buf,v)
699+
700+
if i < nv-1:
701+
buf = "{0}{1}".format(buf,''.join([','] * (nv-i-1)))
702+
703+
return buf
704+
705+
data = StringIO('\n'.join([ f(i, v) for i, v in enumerate(_NA_VALUES) ]))
706+
707+
expected = DataFrame(np.nan,columns=range(nv),index=range(nv))
708+
df = self.read_csv(data, header=None)
709+
tm.assert_frame_equal(df, expected)
710+
686711
def test_custom_na_values(self):
687712
data = """A,B,C
688713
ignore,this,row

0 commit comments

Comments
 (0)