QST: handle lines with less separators than main lines in read_csv

### Research

- [X] I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas) on StackOverflow for similar questions.

- [X] I have asked my usage related question on [StackOverflow](https://stackoverflow.com).


### Link to question on StackOverflow

https://stackoverflow.com/questions/73820090/make-pandas-read-csv-to-not-add-lines-with-less-columns-delimiters-than-the-main

### Question about pandas

The `on_bad_lines=warn` question me. Pandas team added the functionality to directly handle lines with more separators than the main lines, that's why it don't seems strange to me that some other pandas option could handle in the same way *lines with less separators than main lines*

Using `pandas.read_csv` with `on_bad_lines='warn'` option the case there is too many
columns delimiters work well, bad lines are not loaded and stderr catch the bad lines
numbers:

```python
    import pandas as pd
    from io import StringIO
    data = StringIO("""
    nom,f,nb
    bat,F,52
    cat,M,66,
    caw,F,15
    dog,M,66,,
    fly,F,61
    ant,F,21""")
    df = pd.read_csv(data, sep=',', on_bad_lines='warn')

    b'Skipping line 4: expected 3 fields, saw 4\nSkipping line 6: expected 3 fields, saw 5\n'

    df.head(10)
    #    nom  f  nb
    # 0  bat  F  52
    # 1  caw  F  15
    # 2  fly  F  61
    # 3  ant  F  21
```

But in case the number of delimiter (here `sep=,`) is less  than the main, the line
is added adding `NaN`.:

```python
    import pandas as pd
    from io import StringIO
    data = StringIO("""
    nom,f,nb
    bat,F,52
    catM66,
    caw,F,15
    dog,M66
    fly,F,61
    ant,F,21""")
    df = pd.read_csv(data, sep=',', on_bad_lines='warn', dtype=str)
    df.head(10)

    #       nom    f   nb
    # 0     bat    F   52
    # 1  catM66  NaN  NaN            <==
    # 2     caw    F   15
    # 3     dog  M66  NaN            <==
    # 4     fly    F   61
    # 5     ant    F   21
```

Is there a way to make `read_csv` to not add lines with less columns delimiters than
the main lines ?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

QST: handle lines with less separators than main lines in read_csv #48728

Research

Link to question on StackOverflow

Question about pandas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

QST: handle lines with less separators than main lines in read_csv #48728

Description

Research

Link to question on StackOverflow

Question about pandas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions