Skip to content

C API: Support "nullable" parameter types in PyArg_Parse* #112068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
serhiy-storchaka opened this issue Nov 14, 2023 · 6 comments
Closed

C API: Support "nullable" parameter types in PyArg_Parse* #112068

serhiy-storchaka opened this issue Nov 14, 2023 · 6 comments
Assignees
Labels
topic-C-API type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Nov 14, 2023

Feature or enhancement

Some builtin functions accept None as a special value in addition to values that should have the specified type. For example start and stop indices in index(), size in read(), dir_fd in many os functions, and numerous string parameters, e.g. encoding and errors.

There are special format units in PyArg_Parse* functions for "string or None": z, z*, z#. There is half-public converter _PyEval_SliceIndex for indices, and several more private converters. But while NULL can non-ambiguously represent None in C, there is no single integer value for None. In many cases it is -1, but it is 0 and Py_SIZE(self) for indices, AT_FDCWD for dir_fd parameter, etc. This is why _PyEval_SliceIndex keeps the value of the C variable unchanged when the argument is None.

I propose to generalize it to all types. Add the ? modifier. If it occurs after the format unit, and the argument is None, it keeps the initial value of the C variable, like if the argument was optional and missed. For example:

    int dir_fd = AT_FDCWD;
    Py_ssize_t size = PY_SSIZE_T_MAX;
    double speed = 0.0;
    int sep = EOF;
    int flag = -1;
    if (!PyArg_ParseTuple(args, "i?n?d?C?p?", &dir_fd, &size, &speed, &sep, &flag)) {
        return NULL;
    }

The code accepts two integers, a float, a character, and an arbitrary object (as boolean), but any of 5 arguments can also be None, in which case the corresponding variable keeps its initial value.

There was a similar proposition for Argument Clinic: #64540. But this proposition works not only with Argument Clinic, therefore it can be used in third-party code. It works not only with integers, but with all types, including custom converters. It is more convenient, because do not need to use a special structure. I believe that it covers most of use cases, and in few remaining cases you still can handle it manually, as before.

After implementing this for PyArg_Parse* I will add support in Argument Clinic.

Linked PRs

@serhiy-storchaka
Copy link
Member Author

It turned out, that it is much more simpler to implement this as a prefix operator than a suffix. It can even be more efficient in future. So the above example looks:

    int dir_fd = AT_FDCWD;
    Py_ssize_t size = PY_SSIZE_T_MAX;
    double speed = 0.0;
    int sep = EOF;
    int flag = -1;
    if (!PyArg_ParseTuple(args, "?i?n?d?C?p", &dir_fd, &size, &speed, &sep, &flag)) {
        return NULL;
    }

@picnixz
Copy link
Member

picnixz commented Jun 30, 2024

I feel suffixes are easier to understand and many languages use the ? as a suffix, e.g., for the optional chaining operator: a?.b?.c The way I understand it is a? followed by . on a? so ? is more of a suffix. So personally, I prefer suffixes though prefixes are fine. Is there any plan for implementing it (or do you have perhaps a PoC?)

@serhiy-storchaka
Copy link
Member Author

serhiy-storchaka commented Jun 30, 2024

I created a PoC a long time ago, but had doubts about prefix vs suffix. Prefix only requires adding several lines of code in one place:
https://github.com/python/cpython/pull/121187/files#diff-3a8078954be00d3c876e02dd0e5057aad07f2c5f82f6c3c76154dcf2866929cfR544-R557

    if (*format == '?') {
        format++;
        if (arg == Py_None) {
            msg = skipitem(&format, p_va, flags);
            if (msg == NULL) {
                *p_format = format;
            }
            else {
                levels[0] = 0;
            }
            return msg;
        }
        flags |= FLAG_NULLABLE;
    }

and simple changes in 30+ other places are only needed to make the error messages better.

Suffix requires to make additional complex changes in 20-40 places.

@serhiy-storchaka
Copy link
Member Author

#121303 is an alternative implementation of ? as a suffix. It is more complex, and (...)? is not implemented yet, because of complexity and runtime cost of implementation.

@erlend-aasland
Copy link
Contributor

There was a similar proposition for Argument Clinic: #64540.

The discussion in the linked issue showed that this feature was highly controversial. What changed that made this non-controversial?

@serhiy-storchaka
Copy link
Member Author

To me, it is two things:

  • Time is passed, more code was converted to Argument Clinic and we have now more cases for such feature. There are not much of them, but they exist. I started to work on this feature because I see a benefit from it.
  • The original idea with returning a structure with fields for "error" and "is_null" is quite cumbersome. I propose simpler idea -- None means just the default value. This feature is limited, it can only be used if there is a default value, and you can not distinguish between None and the default value (e.g. in round()), but I believe that it covers most of cases.

The rest of the discussion looks to me as a pure bikeshedding -- whether it should be called "nullable" or "Noneable" or whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-C-API type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants