Skip to content
This repository was archived by the owner on Apr 10, 2024. It is now read-only.
This repository was archived by the owner on Apr 10, 2024. It is now read-only.

DEV: Reusing low level constructs in libarrow / Apache Arrow #44

@wesm

Description

@wesm

As I've been prototyping I've copied over a bunch of C++ code from Arrow (https://github.com/apache/arrow) — I'm not sure maintaining near clones of the same code in two places makes sense (see @xhochy comment here b982d96#commitcomment-19406430).

The code in question is:

  • Buffer abstraction (a reference to a block of data)
  • MemoryPool: memory allocation / tracking
  • Status — an object for capturing error information for exception-free C++ programming
  • Bit manipulation utilities (see src/pandas/util/bit-util.h)

Sharing this code means adding libarrow as a build / runtime dependency — if this causes problems in some way, we can absorb the bits of the library that are being used in pandas. We should definitely set using aliases so that we are not using the arrow:: namespace directly in the code for these low level bits.

Later, we can also potentially take advantage of arrow::io, a small IO subsystem for dealing with files, memory maps, etc. This may be useful for revamping the CSV reader.

When we look at adding nested data types to pandas, or even a new string array type, we may want to consider using the Arrow memory layout, so having this in the build toolchain may make life easier in a number of ways.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions