Skip to content

[FEAT] Get a read-only view of the data in py::bytes #2517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anordal opened this issue Sep 21, 2020 · 4 comments
Open

[FEAT] Get a read-only view of the data in py::bytes #2517

anordal opened this issue Sep 21, 2020 · 4 comments

Comments

@anordal
Copy link

anordal commented Sep 21, 2020

Looking at the pybind11::bytes class (in pytypes.h), it is not obvious how to get the data out without first making an unnecessary std::string copy of it.

This function is not terribly useful:

operator std::string() const {
    char *buffer;
    ssize_t length;
    if (PYBIND11_BYTES_AS_STRING_AND_SIZE(m_ptr, &buffer, &length))
        pybind11_fail("Unable to extract bytes contents!");
    return std::string(buffer, (size_t) length);
}

More useful:

operator byte_view() const {
    char *buffer;
    ssize_t length;
    if (PYBIND11_BYTES_AS_STRING_AND_SIZE(m_ptr, &buffer, &length))
        pybind11_fail("Unable to extract contents of bytes object!");
    return { as_unsigned(buffer), as_unsigned(length) };
}
// Boiler plate
using byte_view = std::basic_string_view<uint8_t>;
size_t as_unsigned(ssize_t val) { return static_cast<size_t>(val); }
const uint8_t *as_unsigned(const char *s) { return reinterpret_cast<const uint8_t *>(s); }
@anordal
Copy link
Author

anordal commented Sep 21, 2020

This may be an alternative solution to #1807 – instead of writing functions that take const std::vector<uint8_t> &, write functions that take std::basic_string_view<uint8_t> (or the py::byte_view alias).

  • Pro: Avoid a copy
  • Con: If you want to copy the data into a vector anyway, this becomes explicit. Although much less explicit, due to the right signedness of the presented data (uint8_t instead of char).

As they say,

python -c 'import this' | grep Explicit

@fritzr
Copy link
Contributor

fritzr commented Sep 22, 2020

From the user's perspective, you can currently achieve this via py::buffer::request:

inline std::basic_string_view<uint8_t> bytes_view(py::bytes bytes) {
    py::buffer_info info(py::buffer(bytes).request());
    return { as_unsigned(info.ptr), as_unsigned(info.len) };
}

... But there is some overhead in acquiring + releasing the py::buffer_info since this has to use PyObject_GetBuffer and PyBuffer_Release.

It might be nice to have data() and size() methods in py::bytes, too:

/* XXX: guard error return with pybind11_fail */
const char *data() const { return PYBIND11_BYTES_AS_STRING(m_ptr); }
size_t size() const { return static_cast<size_t>(PYBIND11_BYTES_SIZE(m_ptr)); }

@YannickJadoul
Copy link
Collaborator

It would make sense, yes. It's still up for discussion whether this ought to be a conversion operator or some full method (cfr. your grep Explicit ;-) ), but if it's this easy, it wouldn't be wrong to add, I'd think.

@axsaucedo
Copy link

Thank you for the suggestion @fritzr, this allowed me to have a workaround for #1807, the example is provided there (having said that, the boilerplate code required is something that ideally shouldn't be required, but that can be discussed in the other issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants