-
Notifications
You must be signed in to change notification settings - Fork 6
tracing tune up #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
tracing tune up #880
Conversation
31e2e8b
to
17e3c72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like some of the changes in this PR (even though I'm unsure about the urgency of them compared to other stuff). The only objection I have has to do with changing the log-levels in the traces. Happy to discuss in any case.
/* get destination ip address */ | ||
let Some(dst) = packet.ip_destination() else { | ||
error!("{nfi}: Failed to get destination ip address for packet"); | ||
debug!("{nfi}: Failed to get destination ip address for packet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the intent here (reducing verbosity), but to me the changes in this commit will harm more than anything.
If something is an ERROR or a warning, we should log it as such, because when displayed the severity/loglevel tells you what it is. We now have the option to completely disable traces in the pipeline. We may want to adjust the levels in the source code if some errors get very frequently logged; and we still have the option to rate-limit them for instance. However, in this particular example, I'd say you'd want to see the log as an ERROR in neon-lights since failing to get the destination ip address should be the symptom of something going really wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this can be triggered by a carefully crafted user packet, then this should not be error level as then malicious senders can cause logs to fill up and DOS the gateway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the user can drive us into any situation where ip address is missing then this message is a dos vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, those packets (a packet with ethertype=IP and no IP header) should not even make it here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, but error!
is intended for serious application level errors. This is just what is happening if we happen to get invalid data.
Or are you concerned that we might manipulate the data into an invalid shape by accident (which, I suppose, would promote this to error!
in truth)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I believe that to address your concerns (DoS) we should clearly delimit where it is safe to log "issues" and where not. In my mind, if I have an Ip packet and can't retrieve the ip address:
- either there is a bug in the getter (unlikely), and that should be error
- or the packet is malformed, in which case it should not be a valid packet and make it here (this is the IP-forward stage).
I agree that the parser may not know up to which "layer" it should validate a packet. But we should decide at which point packets can be considered malformed/well-formed so that we can "reasonably" tag logs with the right severity and at the same time safely be protected against DoS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough. It is left at error for now
|
||
/* Read-only access to the fib table */ | ||
let Some(fibtr) = self.fibtr.enter() else { | ||
error!("{nfi}: Unable to lookup fib for vrf {vrfid}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can likely remain an error
. If the user is able to give us messages for vrfs which don't exist then we have a more fundamental problem than DoS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think warn is fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave this one as error. That's not the meaning of the error. It's failing to "enter" in the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
17e3c72
to
cf60f5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Daniel, thanks for addressing my comments. There's one change, though, that I believe is a regression. See #880 (comment)
This path is much too hot to be error tracing on malformed packets. It is a major denial of service vector prior to this commit. Signed-off-by: Daniel Noland <[email protected]>
This is just less complex. Signed-off-by: Daniel Noland <[email protected]>
Some very significant setup functions should appear in telemetry traces. Signed-off-by: Daniel Noland <[email protected]>
Signed-off-by: Daniel Noland <[email protected]>
sorting and such Signed-off-by: Daniel Noland <[email protected]>
cf60f5f
to
0e07973
Compare
I think this is all sorted now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side
On top of #879^^ rebased