Skip to content

Cleanup philosophy doc #634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 6, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions docs/philosophy.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# pandas-stubs Type Checking Philosophy

The goal of the pandas-stubs project is to provide type stubs for the public API
that represent the recommended ways of using pandas. This is opposed to the
that represent the recommended ways of using pandas. This is opposed to the
philosophy within the pandas source, as described [here](https://pandas.pydata.org/docs/development/contributing_codebase.html?highlight=typing#type-hints), which
is to assist with the development of the pandas source code to ensure type safety within
that source.

Due to the methodology used by Microsoft to develop the original stubs, there are internal
Due to the methodology used by Microsoft to develop the original stubs, there are internal
classes, methods and functions that are annotated within the pandas-stubs project
that are incorrect with respect to the pandas source, but that have no effect on type
checking user code that calls the public API.
Expand All @@ -27,12 +27,12 @@ s = pd.Series([1, 2, 3])
lt = s < 3
```

In the pandas source, `lt` is a `Series` with a `dtype` of `bool`. In the pandas-stubs,
the type of `lt` is `Series[bool]`. This allows further type checking to occur in other
In the pandas source, `lt` is a `Series` with a `dtype` of `bool`. In the pandas-stubs,
the type of `lt` is `Series[bool]`. This allows further type checking to occur in other
pandas methods. Note that in the above example, `s` is typed as `Series[Any]` because
its type cannot be statically inferred.

This also allows type checking for operations on series that contain date/time data. Consider
This also allows type checking for operations on series that contain date/time data. Consider
the following example that creates two series of datetimes with corresponding arithmetic.

```python
Expand Down Expand Up @@ -74,22 +74,22 @@ interval of `Timestamp`s.
A set of (most likely incomplete) tests for testing the type stubs is in the pandas-stubs
repository in the `tests` directory. The tests are used with `mypy` and `pyright` to
validate correct typing, and also with `pytest` to validate that the provided code
actually executes. The recent decision for Python 3.11 to include `assert_type()`,
actually executes. The recent decision for Python 3.11 to include `assert_type()`,
which is supported by `typing_extensions` version 4.2 and beyond makes it easier
to test to validate the return types of functions and methods. Future work
to test to validate the return types of functions and methods. Future work
is intended to expand the use of `assert_type()` in the test code.

## Narrow vs. Wide Arguments

A consideration in creating stubs is too make the set of type annotations for
A consideration in creating stubs is to make the set of type annotations for
function arguments "just right", i.e.,
not too narrow and not too wide. A type annotation to an argument to a function or
method is too narrow if it disallows valid arguments. A type annotation to
an argument to a function or method is too wide if
it allows invalid arguments. Testing for type annotations that are too narrow is rather
straightforward. It is easy to create an example for which the type checker indicates
straightforward. It is easy to create an example for which the type checker indicates
the argument is incorrect, and add it to the set of tests in the pandas-stubs
repository after fixing the appropriate stub. However, testing for when type annotations
repository after fixing the appropriate stub. However, testing for when type annotations
are too wide is a bit more complicated.
In this case, the test will fail when using `pytest`, but it is also desirable to
have type checkers report errors for code that is expected to fail type checking.
Expand All @@ -108,9 +108,9 @@ Here is an example that illustrates this concept, from `tests/test_interval.py`:
In this particular example, the stubs consider that `i1` will have the type
`pd.Interval[pd.Timestamp]`. It is incorrect code to add a `Timestamp` to a
time-based interval. Without the `if TYPE_CHECKING_INVALID_USAGE` construct, the
code would fail at runtime. Further, type checkers should report an error for this
incorrect code. By placing the `# type: ignore[operator] # pyright: ignore[reportGeneralTypeIssues]`
on the line, type checkers are told to ignore the type error. To ensure that the
code would fail at runtime. Further, type checkers should report an error for this
incorrect code. By placing the `# type: ignore[operator] # pyright: ignore[reportGeneralTypeIssues]`
on the line, type checkers are told to ignore the type error. To ensure that the
pandas-stubs annotations are not too wide (allow adding a `Timestamp` to a
time-based interval), mypy and pyright are configured to report unused ignore
statements.
Expand Down