-
Notifications
You must be signed in to change notification settings - Fork 70
Open
Description
Python code that writes the file:
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
import polars as pl
pl.DataFrame({'text': "this is some text".split()}).write_ipc("data.arrow")Polars can read this file:
>>> import polars as pl
>>> pl.read_ipc("data.arrow")
shape: (4, 1)
┌──────┐
│ text │
│ --- │
│ str │
╞══════╡
│ this │
│ is │
│ some │
│ text │
└──────┘
>>>
Arrow.jl reads garbage:
julia> import Pkg; Pkg.status()
Status `~/tmp/Project.toml`
[69666777] Arrow v2.8.0
[a93c6f00] DataFrames v1.7.0
julia> using DataFrames; import Arrow
julia> DataFrame(Arrow.Table("./data.arrow"))
4×1 DataFrame
Row │ text
│ String?
─────┼──────────
1 │ W1\0\0
2 │ \xf2\xff
3 │ \v\0\b\0
4 │ \b\0\b\0
julia> Issue: this is not at all what Polars wrote to the file
Other data types are read properly:
> cat arrow_bug.py
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = ["polars<=1.21.0"]
# ///
from datetime import date
import polars as pl
pl.DataFrame({
'text': "this is some text".split(),
'date': [date(2025,1,i+1) for i in range(4)],
'float': [float(i) for i in range(4)],
'int': list(range(4))
}).write_ipc("dates.arrow")
> ./arrow_bug.py
> julia --project
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.11.3 (2025-01-21)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using DataFrames; import Arrow
julia> DataFrame(Arrow.Table("dates.arrow"))
4×4 DataFrame
Row │ text date float int
│ String? Date? Float64? Int64?
─────┼────────────────────────────────────────
1 │ W1\0\0 2025-01-01 0.0 0
2 │ \xf2\xff 2025-01-02 1.0 1
3 │ \v\0\b\0 2025-01-03 2.0 2
4 │ \b\0\b\0 2025-01-04 3.0 3
julia>
Metadata
Metadata
Assignees
Labels
No labels