Skip to content

Delimited Messages - let's harmonize across languages #10229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
otri opened this issue Jul 8, 2022 · 10 comments
Open

Delimited Messages - let's harmonize across languages #10229

otri opened this issue Jul 8, 2022 · 10 comments
Assignees

Comments

@otri
Copy link

otri commented Jul 8, 2022

What language does this apply to?
If it's a proto syntax change, is it for proto2 or proto3?

No syntax change.

If it's about generated code change, what programming language?

All Languages

Describe the problem you are trying to solve.

Delimited messages is so core to serializing repetitive payloads to file and network streams that it seems this should be classed as a core use case.

Describe the solution you'd like

Coming into the parsing of delimited messages fresh and pulling my hair out, I missed the details that function for C++ was contained in delimited_message_util.h. However, the solution presented by Kenton in #710 is IMHO much better and more obvious.

Please harmonize this small but critical delimited function with codegen, and let's get this key function mainlined across; Python, Java, C++, and C#. I'm using this length delimited reading/writing on three of the four languages here, and it's telling of how valuable the cross-platform nature of protocol buffers is with delimited messages.

Describe alternatives you've considered

It's varied, but stuffing the delimited reading/writing into C++ utils is confusing. It's missing from Python so always rolling own, but length delimited function is present in Java and C#. They all share uint32 length with specific byte ordering style.

Additional context
Add any other context or screenshots about the feature request here.

Not at this time.

@fowles
Copy link
Contributor

fowles commented Jul 18, 2022

Honestly, this is a totally reasonable request. We don't have the cycles to pursue it right now, but if you were interested in implementing it we would be happy to accept PRs.

@acozzette
Copy link
Member

We have an internal implementation of this for Python that we might want to just open source.

@neild
Copy link
Contributor

neild commented Jul 19, 2022

FYI, existing proposal for adding this to the Go implementation:
golang/protobuf#1382

@jskeet
Copy link
Contributor

jskeet commented Aug 2, 2022

C# already has ParseDelimitedFrom(Stream) - do we anticipate any need for other changes?

@rmelick-muon
Copy link

@acozzette What is the process like for open sourcing your internal implementation? I'm happy to try and contribute something for python as brand new standalone patch, but if there is something existing that could be released that might be more expedient.

@anandolee
Copy link
Contributor

anandolee commented Oct 31, 2023

I've put the following APIs

def serialize_length_prefixed(message, output) -> None
def parse_lengh_prefixed(message, input_bytes) -> message

into one of our projects' design doc. Will add the support once the design has been approved

@rmelick-muon
Copy link

rmelick-muon commented Nov 2, 2023

@anandolee I've done some implementation in our internal code of very similar APIs, and what I quickly discovered for parsing, was that I also had an api that could handle parsing multiple messages from a stream of bytes (for example an open file).

These would look something like

def parse_all_delimited_from(buffer: bytes, message_class: Type[M]) -> Iterator[M]:

def parse_all_delimited_from_reader(
    reader: BufferedReader, message_class: Type[M]
) -> Iterator[M]:

Or, a parse method that tells you how many bytes it consumed to parse the message, so you can advance your position in a large buffer and then parse another method

def parse_delimited_from(
    buffer: bytes, starting_position: int, message: Message
) -> int:

@thomasvl
Copy link
Contributor

fyi - ObjC (and Swift) both have apis for delimited messages.

@anandolee
Copy link
Contributor

length_prefixed for python is now supported with 3a9f074

@jesseclark
Copy link

Not having parse_delimited_from in the Ruby library is quite painful. Any hope of getting this implemented any time soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests