Skip to content

REF: Make Util functions accessible from C #50425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Dec 24, 2022

Came up when reviewing #50400 with @lithomas1 . Doesn't directly solve this, but a lot of our util.pyx functions are in essence C functions to begin with. Because they are implemented in Cython though, its a built difficult to use them in our non-Cython extensions like JSON.

This takes the code out of the Cython file and ports to a C header file. This is mostly a verbatim copy with some special considerations done for NA handling. Marking as draft because I'm not sure a header-only solution is better than a shared library, but need more time to mess around with the setup script. May also just wait until we implement meson fully

@WillAyd
Copy link
Member Author

WillAyd commented Dec 24, 2022

Initial timeits don't show anything conclusive between main:

In [5]: %timeit lib.is_integer_array(np.array([1, 2]))
525 ns ± 0.77 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [6]: %timeit lib.is_integer_array(np.array([1, 2]))
466 ns ± 1.64 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [7]: %timeit lib.is_integer_array(np.array([1, 2]))
469 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [8]: %timeit lib.is_integer_array(np.array([1, 2]))
468 ns ± 0.787 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

and this PR

In [7]: %timeit lib.is_integer_array(np.array([1, 2]))
531 ns ± 0.679 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [8]: %timeit lib.is_integer_array(np.array([1, 2]))
479 ns ± 1.41 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [9]: %timeit lib.is_integer_array(np.array([1, 2]))
474 ns ± 3.77 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

is_timedelta64 : bool
*/
int is_timedelta64_object(PyObject *obj) {
return PyObject_TypeCheck(obj, &PyTimedeltaArrType_Type);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the implementations ulltimately should be in this header file. Done for now just to emulate the existing util.pxd file, but would make the build system a bit easier with a dedicated .c implementation

The full license is in the LICENSE file, distributed with this software.
*/

#ifndef PANDAS__LIBS_PANDAS_TYPE_H_
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_inference.h might also be a better name than just type.h

@jbrockmendel
Copy link
Member

IIRC cython has some mechanism to make cython functions available in C. is that an option?

@WillAyd
Copy link
Member Author

WillAyd commented Dec 27, 2022

Looks like this is the section in the docs:

https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#using-cython-declarations-from-c

Worth trying in the future for more advanced use cases, but I think here the functions we are putting into Cython functions are really C functions anyway, so would be less steps the way it is

That said, I think this PR will be dependent on moving away from setuptools into meson. setuptools it seems implicitly requires that all extensions you build are actually Python extensions, so we aren't able to really set up the full dependency chain needed without meson

@lithomas1
Copy link
Member

IIRC cython has some mechanism to make cython functions available in C. is that an option?

IMO, ideally, we should be doing C -> Cython since its much easier from a build standpoint unless absolutely necessary (Cython generates the C header files at runtime when going the other way which can be tricky).

@lithomas1
Copy link
Member

Looks like this is the section in the docs:

https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#using-cython-declarations-from-c

Worth trying in the future for more advanced use cases, but I think here the functions we are putting into Cython functions are really C functions anyway, so would be less steps the way it is

That said, I think this PR will be dependent on moving away from setuptools into meson. setuptools it seems implicitly requires that all extensions you build are actually Python extensions, so we aren't able to really set up the full dependency chain needed without meson

This should be possible with build_clib. (See https://github.com/pandas-dev/pandas/pull/47081/files#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7).

The only question left should be whether you want to wait for setuptools support to be dropped. Assuming the meson build system PR lands sometime in January, we'll probably have to wait until no more 2.0.x releases to fully drop setuptools to prevent main/2.x from diverging too much. (I guess we can push the transition more aggressively/delay 2.0 slightly after the PR lands too. At any rate, we'd probably need 1 month with the new meson build system before I'm confident enough to drop setuptools)

@WillAyd
Copy link
Member Author

WillAyd commented Dec 30, 2022

Yea I think this should wait until we drop setuptools. There's no rush on this PR and I'm not confident we can do it well without meson

@jbrockmendel
Copy link
Member

Not a huge deal, but we specifically moved the util.pxd functions from a now-removed numpy_helper.h file years ago to be more friendly to non-C contributors

@WillAyd
Copy link
Member Author

WillAyd commented Jan 5, 2023

Gonna close for now - dependent on the new build system. Can come back after that work stabilizes

@WillAyd WillAyd closed this Jan 5, 2023
@WillAyd WillAyd deleted the c-util branch April 12, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants