-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
QST: handle lines with less separators than main lines in read_csv #48728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for your report. |
@phofl I also experience this problem, I also searched similar opened issues, but I couldn't find either. This missing feature is the reason why I still need to use AWK, to pre-process my CSV files: {
if (NF != 11) next; # if column count is not equal to 11, then remove it
} This feature is a MUST in a tool that's processing CSV. Also, |
Contributions are welcome |
The standard csv DictReader has this approach:
I've used this like this to detect row that have too few columns:
Perhaps something like that could be easily added to pandas? |
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/73820090/make-pandas-read-csv-to-not-add-lines-with-less-columns-delimiters-than-the-main
Question about pandas
The
on_bad_lines=warn
question me. Pandas team added the functionality to directly handle lines with more separators than the main lines, that's why it don't seems strange to me that some other pandas option could handle in the same way lines with less separators than main linesUsing
pandas.read_csv
withon_bad_lines='warn'
option the case there is too manycolumns delimiters work well, bad lines are not loaded and stderr catch the bad lines
numbers:
But in case the number of delimiter (here
sep=,
) is less than the main, the lineis added adding
NaN
.:Is there a way to make
read_csv
to not add lines with less columns delimiters thanthe main lines ?
The text was updated successfully, but these errors were encountered: