Skip to content

avoid UTF8 decoding in git-transport/git-protocol #634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 30, 2022
Merged

Conversation

Byron
Copy link
Member

@Byron Byron commented Nov 30, 2022

Probably mainly for historical reasons we would only have a BufRead instance after a handshake
which makes it difficult to access the packet lines underneath and impossible to do it without using
String as line buffer.

This PR changes that and eithers reads all lines in advance or provides an ExtendedBufRead with
(newly added) readline() support.

Tasks

  • capabilities V2 read from buffer
  • handshake ref lines V1 read line-by-line via Vec<u8> instead of String (for blocking only due to tests)
  • v2 refs read Vec<u8> instead of String (for blocking only due to tests)

extra

  • env var permissions should be by topic and fallback to by-prefix, not the other way around

- remove unwrap() in favor of returning an error
- use expect() instead of unwrap()
That way one won't have to assume UTF8 encoding in the returned buffer.
Note that the reason for it not returning a reference to its internal
buffer is due to the async implementation requiring it. Its future-based
architecture can't really express the lifetimes associated with it (yet).
That way it doesn't have to convert to `String` as intermediary
which may fail as illformed UTF8 might be present.

Performance wise, reading all lines ahead of time, it is the same
as it was before as it would collect all lines beforehand anyway.

We are also seeing only a few lines in V2, and in V1 it was
fully streaming already.
The handshake response itself now provides a `ref` read implementation
with direct readline support. That way one can avoid having to go
through `String`.
The latter can cause issues around illformed UTF-8 which wouldn't
bother git either.

This comes at the expense of not parsing line by line anymore, but
instead reading as fast as possible and parsing afterwards.

Performance wise I think it doesn't matter, but it will cause
more memory to be used. If this ever becomes a problem,
for example during pushes where we are stuck with V1, we can consider
implementing our own streaming appreach that works with packet lines
instead - they are just not exposed here even though they could.
Now `Permissions` for environment variables are so that they
are by topic instead of by prefix, by default. That way
it's easier to allow or deny particular sets of related
environment variables.

The catch-all permissions by prefix are still present for all
other variables that aren't contained in one particular topic.
@Byron Byron force-pushed the remove-lines-parsing branch 2 times, most recently from 8aeffde to 5fe6aa3 Compare November 30, 2022 20:01
@Byron Byron merged commit 9d8e32d into main Nov 30, 2022
@Byron Byron deleted the remove-lines-parsing branch November 30, 2022 20:10
@paulyoung
Copy link
Contributor

Thanks! 🙂

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Aug 9, 2025
The `ReadDataLineFuture` struct was among the facilities added
within `gix-packetline` in 41fdb84 (GitoxideLabs#634), and it looks like it may
have been intended to be used eventually. However, it is unused.

While it is public in `read::sidebands::async_io` module, the only
facility from that module that is exposed through `read::sidebands`
is `WithSidebands`:

https://github.com/GitoxideLabs/gitoxide/blob/04a18f3a4520dd6f49b5f87fe3782dd1cd1547f2/gix-packetline/src/read/sidebands/mod.rs#L6-L9

This situation appears to be long-standing. However, possibly
because it is public in its own directly containing module, it was
not detected as dead code by `cargo clippy` until recently.

Specifically, this caused the CI `lint` job to fail starting in the
recently released Rust 1.89. This can be observed by rerunning the
workflow where it had passed before:

https://github.com/EliahKagan/gitoxide/actions/runs/16739420509/job/47709197815

(The failure only happens in the `lean-async` run of `cargo clippy`
because the containing module is not used at all in the `max`,
`small`, and `max-pure` builds.)

For now, this commit suppresses the error by adding
`#[allow(dead_code)]` to `ReadDataLineFuture`.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Aug 9, 2025
The `ReadDataLineFuture` struct was among the facilities added
within `gix-packetline` in 41fdb84 (GitoxideLabs#634), and it looks like it may
have been intended to be used eventually. However, it is unused.

While it is public in `read::sidebands::async_io` module, the only
facility from that module that is exposed through `read::sidebands`
is `WithSidebands`:

https://github.com/GitoxideLabs/gitoxide/blob/04a18f3a4520dd6f49b5f87fe3782dd1cd1547f2/gix-packetline/src/read/sidebands/mod.rs#L6-L9

This situation appears to be long-standing. However, possibly
because it is public in its own directly containing module, it was
not detected as dead code by `cargo clippy` until recently.

Specifically, this caused the CI `lint` job to fail starting in the
recently released Rust 1.89. This can be observed by rerunning the
workflow where it had passed before:

https://github.com/EliahKagan/gitoxide/actions/runs/16739420509/job/47709197815

(The failure only happens in the `lean-async` run of `cargo clippy`
because the containing module is not used at all in the `max`,
`small`, and `max-pure` builds.)

For now, this commit suppresses the error by adding
`#[allow(dead_code)]` to `ReadDataLineFuture`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants