-
-
Notifications
You must be signed in to change notification settings - Fork 346
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Config Parse/Read/Write Discussion #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi Corbin,
it’s great to meet you and I will do my best to support your work. As I am currently traveling at 300km through China I won’t get to answering the questions right now, but will do in a few hours.
Also I am thinking it might be best to setup a zoom call (or equivalent) to get you started more quickly. As you saw I already laid out the structure, wanting to add the implementation next. Maybe it might be easiest to do that together even so you can chime in and potentially take it from there. That could be a great entry point for some Rust mentoring too, in case you are interested.
Talk more in a bit 😊
…Sent from my iPhone
On Nov 30, 2020, at 1:01 PM, Corbin Crutchley ***@***.***> wrote:
Hi! I'm a developer that's been tasked to implement GitOxide into a WASM stack in an effort to clone a project as fast as possible in the browser.
As part of that work, I'm looking to introduce a lot of new functionality into the project to get clone working. As an initial task, I'm hoping to get the git-config package able to read and write config files. I've got a (very) WIP port of config parsing logic from isomorphic-git over on my fork.
I won't bother asking for a code review right now, I'm brand new to Rust and have a colleague taking a look at the code right now to give me some pointers on how I can improve my work
That said, I am unsure of what direction we want to go in with regards to data organization. The current output is a pretty unmanageable tangled vector of Sections and Entries alike in a single struct
I've noticed that the current git-config file has both Section and Entry types that look to contain all of the relevant properties, so I'll be refactoring my current code to use that data structure.
Outside of that, I was hoping you could give me some kind of direction when it comes to the different mods in the git-config/lib.rs file:
https://github.com/Byron/gitoxide/blob/main/git-config/src/lib.rs#L57-L72
https://github.com/Byron/gitoxide/blob/main/git-config/src/lib.rs#L131-L217
I'm not entirely sure what borrowed or spawned is referring to (which I admit as a point-of-noobiness), and I'm not sure I track what Span is doing here.
I also would love some insight as to what a Token is referring to here:
https://github.com/Byron/gitoxide/blob/main/git-config/src/file.rs#L4-L23
As I'm not sure an instance that would have a Section and only a single Entry
Looking forward to learning more and being able to meaningfully contribute upstream!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Entirely understandable. Honestly, I'm surprised you responded as quickly as you have at all! I was thinking the same thing about setting up a zoom (or similar) call! I would love to work together to add the implementation - I want to make sure the code standards are up-to-snuff with the rest of the package and would absolutely love some mentoring on my Rust journey! If we'd like to schedule a meeting privately, I can be reached at Just as a heads up, our schedule is way different, as I'm located in California, USA, but I'd be happy to modify my schedule and do a (for me) very early morning/late night call in order to make this work. Thanks so much for being so open and welcoming! Looking forward to working together! |
… structures Relevant for #31, but might not yet be enough to make sense of it all. Try working backwards from `git_config::File`
Alright, I thought it might be easiest to add comments in code, a commit should show up in this thread for you to look at. It's definitely not perfect but might already help a little. Truth to be told, I think I have pretty much sketched out something that I believe I could implement when doing zero-copy parsing using rust-dangerous. The crates author is super experienced with parsers and very nice and responsive, bringing the crate to a point where it would be perfect for gitoxide because of it's ease-of-use and its ability to create nice parsing errors out of the box. My vision for the implementation is definitely not the easiest, but it's designed to be en-par with the original git implementation which has a lot of bells and whistles, out of necessity I suppose. For instance, it's non-destructive for the most part even in the presence of comments. Using regex for parsing is a none-starter, which unfortunately rules out a port of isomorphic-git, the reason being that the regex crate bloats any binary and takes a long time to compile which again, would not be competitive enough with the existing git-config implementation in C. Of course I hope that this doesn't scare you, because if I can do it, so can you. For a lack of time I can't propose some times for a Zoom call right now, but would hope you drop me some suggestions (to the email address I use in commits) of times that would work for you (I am UTC+8), any remaining day this week. Thanks a bunch! |
Love the extra comments, they help a lot. I think the zero-copy parsing is certainly the way to go. Non-destructive and more performant are huge reasons for going that route. I am under a bit of a deadline for an initial draft of the cloning work (although I have time afterward to perf optimize), so I'll likely be continuing the development of my regex implementation separately, and working to get the zero-copy parsing implementation mainlined during the same time. I'll be sending out an email right now to try to figure out a time that would work best for a sync. :D |
Hey folks, I actually started writing a For what it's worth, my parser is currently zero-copy and guarantees emitting enough events to identically reconstruct the source git config file, and currently depends only on I'm currently writing a higher level abstraction on top of the parse that allows for semi-efficient insertion and deletion to modify git config files in place. Would the gitoxide team be interested in merging the work? |
Hi @edward-shen, thanks for sharing your work! I am quite baffled by it's completeness and amazed that it integrates with serde. The last time I deep dived into Since the implementation here is merely an API sketch along with a 'sample' of a parser using There reason for It's used to parse bytes directly due to The idea was to provide better error messages than git itself in case of malformed files while making it easy (maybe even with a compile flag) to allow other encodings in values or section names - right now these apparently have to be ascii/latin character sets (my knowledge about this is just based on a few manual tests though). I would be very interested in your thoughts on where the current implelementation/ideas of |
Hey @Byron! Glad to hear that the team is interested. I should first clarify that while the crate is named
I chose nom because the combinatorial parsing strategy was intuitive for me to conceptually understand, and made testing very easy for me. It was very easy for me to quickly debug and unit test, which helpful in getting a parser quickly. That being said, I'm sure we can do some similar strategy for Currently, error messages in my work are very scuffed—at best you'll just get an error message of where the parsing fails, but that's an improvement for when more functionality is done. That being said, since we do know where the parsing fails, giving an meaningful error message would just be as simple as bubbling what sub-parser failed at what input, which we can derive display from easily.
This is a good point that I haven't considered, but I think it's fine to be more restrictive than git in this case. Considering the use case of
For a little more context, my currently architecture is a two layer implementation, with a lower-level parser powering a higher level wrapper. The lower-level parser only handles emitting text events, while the wrapper itself handles reading and modifying that stream of events. This lets us expose both layers in case users want to use the lower level parser instead of the abstraction. This seems the primary difference in the architecture, since the The current "roadmap" for my implementation would be:
And probably after all that is done would I feel confident releasing a 1.0 version. On a side note, is there a reason why you have |
Apologies in advance if my reply seems a bit short or concise, it's really that I am short in time but try not to wait too long with my reply here. It would be great to find a way to benefit from your implementation which looked like it adheres to high quality standards.
From my experience with the Having helpful error messages is a must for the
Here I would be very conservative and assume that all kinds of potentially uncommon encodings are produced by people editing their config files somewhere in the world, so being more restrictive than
That's true, the parser here is considered an implementation detail. Exposing it is nothing I would be too afraid of though as I am OK incrementing major version numbers liberally on low level crates.
That's something I very much look forward to! It's nothing that
Even though it's not 'sketched' yet I also want a combined
Thus far there is only one real 'blocker' which is the usage of
I have seen this being used in a crate I love, Reading my ramblings, here is the summary:
Anything I said should not be understood as discouragement or lack of confidence, as I can only restate that I am excited about the implementation and would love to make it a part of That ended up being longer than anticipated, I hope you can find it useful in a way that helps making this work :). |
No need to rush—I just didn't want us to have duplicated work. Considering the breadth of what you do, it's understandable that you're very short on time, so I definitely appreciate the thought out response.
I definitely agree, and it's definitely a shortfall in my current implementation that can be worked on. I don't mind using a different crate for parsing, but considering how the parser is already completed in
I'm not too strongly for or against using I think I talked myself into supporting
Don't worry, I haven't gotten that impression at all. Just sounds like a maintainer doing due diligence and getting motivations and goals cleared up ;) If my responses look good to you, joining sounds good! How would you like to move forward then? I don't mind working on this in my own repo until I get some minimal viable product ready to merge, or work in a branch in the main repo. Whatever works for you! Edit: Parser now uses |
It's quite incredible how scarce focus and uninterrupted keyboard time have become thees days and writing complex code like…before… is something I can only dream about.
This is the best time to see if @avitex , the author of Personally I think it would end up being a rewrite, especially since without the use of the As using When done right, @avitex has done tremendous work reviewing my first steps towards implementing a parser (see #40 and #44), and I feel a bit guilty to think about adopting
Neat! Converting from
🙌
Since you do all the work really I want to do my very best to not add any hurdles and actually slow your progress or enjoyment developing something that essential to Here is some suggestions and leading questions:
No matter what happens, I firmly believe that this conversation alone is of great value - in the best case |
I took a deeper look at the API sketch and there's an incredible number of similarities between what we have. I think the primary divergent idea would be how the gitconfig editing occurs—the sketch seems to use The primary concern with that approach though would be handling multivars, which were a pain point for me. I'm not sure how the user would be able to control which instance of the key/value pair would be set through that appraoch.
This was actually a problem I ran into, since the parser actually only returned references. The solution for me to use This somewhat leaks implementation detail of higher level abstractions into lower ones, but I think it's better than to have an owned versus copied event enums.
Yes, of course! I've already made serde an optional feature.
Now that I think about it, I haven't really seen any meaningful errors from As a side note, I've been working on getting stuff well tested. After 16 million fuzzer iterations without error, I think I'm semi-confident that my implementation is panic-free, but I'm going to leave it running overnight to make sure. ;) |
Thanks for all your effort - it's more appreciated than has been expressed.
Even though the API might be lacking, unintuitive or incorrect when actually trying to implement it, the idea is that every owned entry is either new or was created from an existing borrowed entry. This allows these updates to know where they belong. Ultimately borrowed sections or entries know their index in the list of parsed tokens, which is immutable. Doing edits by pre-recording them and applying them during serialization certainly is complex and maybe could be simplified by letting In short: should work, but who knows 😅. Please note that No lookup tables or acceleration structures are used to avoid allocations where possible and I take a wait-and-see stance to add them only when there is actual need.
I truly believe this delays the inevitable problem: Either each access to git configuration creates a temporary In the sketch that's the Being able to parse once and keep the buffers is the reason the sketched API looks the way it looks, a lot of effort it creates for itself to avoid allocations and keep things minimal in memory. Anyway, ranting :D, but in short a good test would be to try keeping the buffer within
Yes, 'only' these would be the minimum, and with
Pretty neat! I myself never got around using an actual fuzzer, but resorted to 'stress' tests that verify big repositories. A few issues were found that way, but of course, tells nothing about malformed input. I hope this helps. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hi! I'm a developer that's been tasked to implement GitOxide into a WASM stack in an effort to clone a project as fast as possible in the browser.
As part of that work, I'm looking to introduce a lot of new functionality into the project to get
clone
working. As an initial task, I'm hoping to get thegit-config
package able to read and write config files. I've got a (very) WIP port of config parsing logic fromisomorphic-git
over on my fork.That said, I am unsure of what direction we want to go in with regards to data organization. The current output is a pretty unmanageable tangled vector of Sections and Entries alike in a single
struct
I've noticed that the current
git-config
file has bothSection
andEntry
types that look to contain all of the relevant properties, so I'll be refactoring my current code to use that data structure.Outside of that, I was hoping you could give me some kind of direction when it comes to the different
mod
s in thegit-config/lib.rs
file:https://github.com/Byron/gitoxide/blob/main/git-config/src/lib.rs#L57-L72
https://github.com/Byron/gitoxide/blob/main/git-config/src/lib.rs#L131-L217
I'm not entirely sure what
borrowed
orspawned
is referring to (which I admit as a point-of-noobiness), and I'm not sure I track whatSpan
is doing here.I also would love some insight as to what a
Token
is referring to here:https://github.com/Byron/gitoxide/blob/main/git-config/src/file.rs#L4-L23
As I'm not sure an instance that would have a
Section
and only a singleEntry
Looking forward to learning more and being able to meaningfully contribute upstream!
The text was updated successfully, but these errors were encountered: