-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
I post this as a "Question" because I am quite new to pandas
. So maybe I miss some understandings and the "problem" described by me is by design and you have good reasons for that.
I use pandas 1.2.4, with Python 3.9.4 on Windows 10 64 bit.
As a user I would expect that pandas check the number of fields per row when importing via csv file. But IMHO it does not in all cases.
Example 1
Here is a csv file without header and but a set names=
attribute with three fields. So pandas should be able to know how many fields/columns should be in the CSV file. The second row contains 4 instead of 3 fields.
import pandas
import io
csv_without_header = io.StringIO(
'A;B;C\n'
'D;E;X;Y\n'
'F;G;H'
)
df = pandas.read_csv(csv_without_header, encoding='utf-8', sep=';',
header=None,
names=['First', 'Second', 'Third'])
Pandas import this without warnrings or errors. The 4th field in the 2nd row is simply ignored.
Example 2
I added a header line into the csv file with again three fields.
So pandas should be able to know how many fields/columns should be in the CSV file.
And again the second row contains 4 instead of 3 fields.
csv_with_header = io.StringIO(
'First;Second;Third\n'
'A;B;C\n'
'D;E;X;Y\n'
'F;G;H'
)
df = pandas.read_csv(csv_with_header, encoding='utf-8', sep=';')
Here an error occurs as I expect.
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4
Example 3
There are less then 3 fields in the 2nd row. Again here is no warning or error. The missing field is set with NaN
. And here it does not matter if you give the number of (expected) fields via header line in the CSV or via names=
attribute.
csv_with_header = io.StringIO(
'First;Second;Third\n'
'A;B;C\n'
'D;Y\n'
'F;G;H'
)
csv_without_header = io.StringIO(
'A;B;C\n'
'D;Y\n'
'F;G;H'
)
df_a = pandas.read_csv(csv_with_header, encoding='utf-8', sep=';')
df_b = pandas.read_csv(csv_without_header, encoding='utf-8', sep=';', names=['First', 'Second', 'Third'])
Want I want is to import CSV files and be informed if there are to many or less then the expected number of fields in any row.