-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
While integrating pyarrow-stubs into a project using Arrow IPC streaming, I encountered several type annotation gaps that required workarounds (# type: ignore comments or cast() calls). This issue documents these gaps to help improve stub coverage.
Environment:
pyarrow-stubsversion: 17.11pyarrowversion: 19.0.1mypyversion: 1.14.1- Python version: 3.12
Issues Found
pa.PythonFileconstructor doesn't accept standard file-like objects
Problem: The PythonFile constructor signature is too restrictive. It doesn't accept IO[bytes] or BufferedIOBase objects without explicit casting.
Workaround required:
import io
from typing import cast
import pyarrow as pa
# This requires a cast:
stdin_sink = pa.PythonFile(cast(io.IOBase, proc.stdin))
# Similarly for stdout:
pa.PythonFile(cast(io.IOBase, sys.stdout.buffer), mode="w")Expected: PythonFile.__init__ should accept IO[bytes], BufferedIOBase, or a typing.BinaryIO union.
pa.BufferReaderincompatible withpa.ipc.read_schema()
Problem: When passing a BufferReader to ipc.read_schema(), mypy reports an argument type error.
Workaround required:
output_schema_bytes: bytes = ...
output_schema = pa.ipc.read_schema(pa.BufferReader(output_schema_bytes)) # type: ignore[arg-type]Expected: ipc.read_schema() should accept BufferReader (or its parent NativeFile) as a valid input type.
pa.schema()field list typing is overly restrictive
Problem: Creating a schema from a list of tuples [("name", pa.string())] or pa.Field objects causes type errors.
Workaround required:
from typing import Any
def make_schema(fields: list[Any]) -> pa.Schema:
"""Helper to avoid mypy errors with field lists."""
return pa.schema(fields)
# Usage:
schema = make_schema([("x", pa.int64()), ("y", pa.string())])
schema = make_schema([pa.field("x", pa.int64())])Expected: pa.schema() should accept:
list[tuple[str, DataType]]list[Field]Iterable[tuple[str, DataType] | Field]
pyarrow.compute.filter()missingRecordBatchoverload
Problem: pc.filter() works with RecordBatch at runtime but the stubs only define overloads for Array and ChunkedArray.
Workaround required:
import pyarrow.compute as pc
batch: pa.RecordBatch = ...
mask: pa.BooleanArray = ...
result = pc.filter(batch, mask) # type: ignore[call-overload]Expected: Add overload for RecordBatch:
@overload
def filter(
values: RecordBatch,
selection_filter: Array | ChunkedArray,
/,
null_selection_behavior: Literal["drop", "emit_null"] = ...,
) -> RecordBatch: ...pa.Scalargeneric requires TYPE_CHECKING import pattern
Problem: Using pa.Scalar[T] as a type annotation at runtime raises errors because Scalar isn't subscriptable at runtime in older patterns.
Current pattern required:
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from pyarrow import Scalar
# Then use as:
positional: tuple[Scalar[Any] | None, ...] = ()
named: dict[str, Scalar[Any]] = {}This is a minor issue but worth noting for documentation.
Component(s)
Python