You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Refactor decimal conversion in PyArrow tables to use direct casting (#544)
This PR replaces the previous implementation of convert_decimals_in_arrow_table() with a more efficient approach that uses PyArrow's native casting operation instead of going through pandas conversion and array creation.
- Remove conversion to pandas DataFrame via to_pandas() and apply() methods
- Remove intermediate steps of creating array from decimal column and setting it back
- Replace with direct type casting using PyArrow's cast() method
- Build a new table with transformed columns rather than modifying the original table
- Create a new schema based on the modified fields
The new approach is more performant by avoiding pandas conversion overhead. The table below highlights substantial performance improvements when retrieving all rows from a table containing decimal columns, particularly when compression is disabled. Even greater gains were observed with compression enabled—showing approximately an 84% improvement (6 seconds compared to 39 seconds). Benchmarking was performed against e2-dogfood, with the client located in the us-west-2 region.

Signed-off-by: Jayant Singh <[email protected]>
0 commit comments