[Python][Types] Type stub improvements for better coverage with Arrow IPC and compute operations

While integrating `pyarrow-stubs` into a project using Arrow IPC streaming, I encountered several type annotation gaps that required workarounds (# type: ignore comments or cast() calls). This issue documents these gaps to help improve stub coverage.

Environment:
- `pyarrow-stubs` version: 17.11
- `pyarrow` version: 19.0.1
- `mypy` version: 1.14.1
- Python version: 3.12

Issues Found

1. `pa.PythonFile` constructor doesn't accept standard file-like objects

Problem: The `PythonFile` constructor signature is too restrictive. It doesn't accept `IO[bytes]` or `BufferedIOBase` objects without explicit casting.

Workaround required:
```python
import io
from typing import cast
import pyarrow as pa

# This requires a cast:
stdin_sink = pa.PythonFile(cast(io.IOBase, proc.stdin))

# Similarly for stdout:
pa.PythonFile(cast(io.IOBase, sys.stdout.buffer), mode="w")
```

Expected: `PythonFile.__init__` should accept `IO[bytes]`, `BufferedIOBase`, or a `typing.BinaryIO` union.

---

2. `pa.BufferReader` incompatible with `pa.ipc.read_schema()`

Problem: When passing a `BufferReader` to `ipc.read_schema()`, mypy reports an argument type error.

Workaround required:

```python
output_schema_bytes: bytes = ...
output_schema = pa.ipc.read_schema(pa.BufferReader(output_schema_bytes))  # type: ignore[arg-type]
```

Expected: `ipc.read_schema()` should accept `BufferReader` (or its parent `NativeFile`) as a valid input type.

---

3. `pa.schema()` field list typing is overly restrictive

Problem: Creating a schema from a list of tuples `[("name", pa.string())]` or `pa.Field` objects causes type errors.

Workaround required:

```python
from typing import Any

def make_schema(fields: list[Any]) -> pa.Schema:
    """Helper to avoid mypy errors with field lists."""
    return pa.schema(fields)

# Usage:
schema = make_schema([("x", pa.int64()), ("y", pa.string())])
schema = make_schema([pa.field("x", pa.int64())])
```

Expected: `pa.schema()` should accept:

- `list[tuple[str, DataType]]`
- `list[Field]`
- `Iterable[tuple[str, DataType] | Field]`

---

4. `pyarrow.compute.filter()` missing `RecordBatch` overload

Problem: `pc.filter()` works with `RecordBatch` at runtime but the stubs only define overloads for `Array` and `ChunkedArray`.

Workaround required:

```python
import pyarrow.compute as pc

batch: pa.RecordBatch = ...
mask: pa.BooleanArray = ...
result = pc.filter(batch, mask)  # type: ignore[call-overload]
```

Expected: Add overload for `RecordBatch`:

```python
@overload
def filter(
    values: RecordBatch,
    selection_filter: Array | ChunkedArray,
    /,
    null_selection_behavior: Literal["drop", "emit_null"] = ...,
) -> RecordBatch: ...
```

---

5. `pa.Scalar` generic requires TYPE_CHECKING import pattern

Problem: Using `pa.Scalar[T]` as a type annotation at runtime raises errors because `Scalar` isn't subscriptable at runtime in older patterns.

Current pattern required:

```python
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
    from pyarrow import Scalar

# Then use as:
positional: tuple[Scalar[Any] | None, ...] = ()
named: dict[str, Scalar[Any]] = {}
```

This is a minor issue but worth noting for documentation.


### Component(s)

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Python][Types] Type stub improvements for better coverage with Arrow IPC and compute operations #48711

Component(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python][Types] Type stub improvements for better coverage with Arrow IPC and compute operations #48711

Description

Component(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions