Skip to content

Permit covariance of key type in read_csv converters argument #450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 26, 2022

Conversation

gandhis1
Copy link
Contributor

  • Tests added: Please use assert_type() to assert the type of any return value

If it accepts a Dict[int | str, ...] it should accept a Dict of either key type.

@@ -1278,6 +1279,43 @@ def test_read_csv() -> None:
pd.DataFrame,
)

# Allow a variety of dict types for the converters parameter
converters1 = {"A": lambda x: str, "B": lambda x: str}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I wanted this dictionary to be {"A": str, "B": str}. But the type of str is Type[str], which apparently isn't compatible with Callable[[str], Any]. Any idea if this is fixable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be an issue with using a lambda function, which is untyped. So if you did

def convert_to_str(a: object) -> str:
    return str(a)

and then

converters1 = {"A":convert_to_str, "B": convert_to_str}

that might work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the current version works fine because it's actually a Callable. What isn't working is passing Type[str] directly. That makes the dict a Dict[str, Type[str]]. That's what I was wondering about.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the type checkers know that str could be either a type or a callable.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try avoiding the lambda if you can. (or at least add tests without it)

converters: dict[int | str, Callable[[str], Any]] | None = ...,
converters: dict[int | str, Callable[[str], Any]]
| dict[int, Callable[[str], Any]]
| dict[str, Callable[[str], Any]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try Mapping[int | str, Callable[[str], Any]] instead of the union?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that first. But it seems Mapping isn't covariant over the key, only the value: python/typing#445

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But mapping may work for that the Type[str] issue since that is the value...will try that

I also wonder if creating a covariant TypeVar and then a Mapping[T_co, ...] will work...I will also try that.

@gandhis1
Copy link
Contributor Author

Ok, the version I just pushed should fix all of that, and also fix the other parameters too - na_values and parse_dates.

This is something that is probably going to come up over and over again throughout this code base. I think as a rule of thumb all Dict inputs need to be a Mapping if we want value covariance, and we need to do unions if we want key covariance. And honestly I would guess the overwhelming majority of places we want both.

…ovariance, and add tests for na_values and parse_dates
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Nov 26, 2022

This is something that is probably going to come up over and over again throughout this code base. I think as a rule of thumb all Dict inputs need to be a Mapping if we want value covariance, and we need to do unions if we want key covariance. And honestly I would guess the overwhelming majority of places we want both.

I think we have to look at all of them and see. Probably dict[str, int] is OK, but if we have two or more possibilities for the value, then we should switch to Mapping, and if we have two or more possibilities for the key, then we have to use the pattern you used here by including each possible key with union.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @gandhis1

@Dr-Irv Dr-Irv merged commit dc5ea0e into pandas-dev:main Nov 26, 2022
@gandhis1 gandhis1 deleted the mapping branch November 28, 2022 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants