Skip to content

Conversation

@ghazel
Copy link
Contributor

@ghazel ghazel commented Feb 16, 2018

A few notes:

  • fixes several bugs in the old parser
  • does not require NULL termination of the packet
  • does not use the heap
  • recursive, but tracks depth. could abort at a max depth, but even a full packet of [[[[... doesn't add much to the stack

@jech
Copy link
Owner

jech commented Feb 16, 2018

That looks very good.

Rather than walking the dictionary multiple times, once for each key, I suggest we walk it just once, filling in the structures on the fly, doing the loop in bdecode_find_key only once. This implies that either we make the low-level bdecoding code know about which keys we're interested in, or passing around a data structure that maps key to pointer to field (or key to offset within a struct, doesn't matter).

On a related note, I suggest creating a struct that contains all the data we're interested in, allocate it in the main loop (on the stack) and having parse_message fill in the fields rather than the current nonsense.

Please make sure that all of your functions are declared static unless they're prefixed with dht_.

(Feel free to push --force, Github deals well with that.)

@jech
Copy link
Owner

jech commented Feb 16, 2018

Line 2848, please use a while loop.

@ghazel
Copy link
Contributor Author

ghazel commented Feb 16, 2018

The parser is designed to be progressive, always returning a pointer to the next character. Since memmem always started over as well, I left that alone.

If the code can be very careful to check for keys in lexographic order, it could simply update buf/buflen.

@jech
Copy link
Owner

jech commented Feb 16, 2018

I'm pretty sure your code will be both simpler and cleaner if you do a single pass over the dictionary. Something like

while(not at end) {
    foo = get_next_entry()
    if foo->name == "port"
        result->port = atoi(foo->value)
    else if foo->name == "q"
        etc
}

After you've reached the end of the dictionary, you've filled in all the fields of result, and you return success.

@jech
Copy link
Owner

jech commented Feb 17, 2018

Greg, can you please check my branch bdecode and tell me if it's okay to commit that under your name?

@ghazel ghazel force-pushed the bdecode branch 2 times, most recently from 695ebbd to b856c94 Compare February 17, 2018 09:36
@ghazel
Copy link
Contributor Author

ghazel commented Feb 17, 2018

I rebased on master and updated the PR. (and then came here and saw your edited comment :/)

Yes, feel free to use my name. Also make sure you get the want = -1 and while changes from my updated version.

@jech
Copy link
Owner

jech commented Feb 17, 2018 via email

@ghazel
Copy link
Contributor Author

ghazel commented Feb 17, 2018 via email

@ghazel
Copy link
Contributor Author

ghazel commented Mar 20, 2018

Anything left to merge this in?

@jech
Copy link
Owner

jech commented Mar 21, 2018 via email

@GaryElshaw
Copy link

It would be great if this could be revisited, is there a reason why it remains in limbo?

@jech
Copy link
Owner

jech commented Mar 16, 2023

There's absolutely no doubt that @ghazel 's code is both cleaner and more correct than the current hackish code. However, the current code is stable and works well in practice. In order to change it, I'd need (1) a good reason, and (2) a set of unit tests.

Concerning (1), except for the intellectual satisfaction of having clean code, what are the advantages to switching to the bdecoder? Concerning (2), who's going to write a set of unit tests so we can switch to Greg's code without fear?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants