Skip to content

Conversation

daniel-noland
Copy link
Collaborator

@daniel-noland daniel-noland commented Sep 29, 2025

On top of #879

^^ rebased

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/tracing-tune-up branch 3 times, most recently from 31e2e8b to 17e3c72 Compare September 29, 2025 23:57
@daniel-noland daniel-noland changed the title Pr/daniel noland/tracing tune up tracing tune up Sep 30, 2025
@daniel-noland daniel-noland self-assigned this Sep 30, 2025
@daniel-noland daniel-noland added the clean-up Code base clean-up, no functional change label Sep 30, 2025
@daniel-noland daniel-noland marked this pull request as ready for review September 30, 2025 00:19
@daniel-noland daniel-noland requested a review from a team as a code owner September 30, 2025 00:19
@daniel-noland daniel-noland requested review from Fredi-raspall and removed request for a team September 30, 2025 00:19
Copy link
Contributor

@mvachhar mvachhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commits that are in addition to #879 are approved, but I'll withhold final approval so that we don't accidentally merge this before #879 is ready.

Copy link
Contributor

@Fredi-raspall Fredi-raspall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like some of the changes in this PR (even though I'm unsure about the urgency of them compared to other stuff). The only objection I have has to do with changing the log-levels in the traces. Happy to discuss in any case.

/* get destination ip address */
let Some(dst) = packet.ip_destination() else {
error!("{nfi}: Failed to get destination ip address for packet");
debug!("{nfi}: Failed to get destination ip address for packet");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the intent here (reducing verbosity), but to me the changes in this commit will harm more than anything.
If something is an ERROR or a warning, we should log it as such, because when displayed the severity/loglevel tells you what it is. We now have the option to completely disable traces in the pipeline. We may want to adjust the levels in the source code if some errors get very frequently logged; and we still have the option to rate-limit them for instance. However, in this particular example, I'd say you'd want to see the log as an ERROR in neon-lights since failing to get the destination ip address should be the symptom of something going really wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this can be triggered by a carefully crafted user packet, then this should not be error level as then malicious senders can cause logs to fill up and DOS the gateway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the user can drive us into any situation where ip address is missing then this message is a dos vector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, those packets (a packet with ethertype=IP and no IP header) should not even make it here, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but error! is intended for serious application level errors. This is just what is happening if we happen to get invalid data.

Or are you concerned that we might manipulate the data into an invalid shape by accident (which, I suppose, would promote this to error! in truth)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I believe that to address your concerns (DoS) we should clearly delimit where it is safe to log "issues" and where not. In my mind, if I have an Ip packet and can't retrieve the ip address:

  • either there is a bug in the getter (unlikely), and that should be error
  • or the packet is malformed, in which case it should not be a valid packet and make it here (this is the IP-forward stage).

I agree that the parser may not know up to which "layer" it should validate a packet. But we should decide at which point packets can be considered malformed/well-formed so that we can "reasonably" tag logs with the right severity and at the same time safely be protected against DoS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. It is left at error for now


/* Read-only access to the fib table */
let Some(fibtr) = self.fibtr.enter() else {
error!("{nfi}: Unable to lookup fib for vrf {vrfid}");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can likely remain an error. If the user is able to give us messages for vrfs which don't exist then we have a more fundamental problem than DoS

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think warn is fair

Copy link
Contributor

@Fredi-raspall Fredi-raspall Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave this one as error. That's not the meaning of the error. It's failing to "enter" in the reader.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@daniel-noland daniel-noland force-pushed the pr/daniel-noland/tracing-tune-up branch from 17e3c72 to cf60f5f Compare September 30, 2025 20:45
Copy link
Contributor

@Fredi-raspall Fredi-raspall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daniel, thanks for addressing my comments. There's one change, though, that I believe is a regression. See #880 (comment)

This path is much too hot to be error tracing on malformed packets.
It is a major denial of service vector prior to this commit.

Signed-off-by: Daniel Noland <[email protected]>
This is just less complex.

Signed-off-by: Daniel Noland <[email protected]>
Some very significant setup functions should appear in
telemetry traces.

Signed-off-by: Daniel Noland <[email protected]>
sorting and such

Signed-off-by: Daniel Noland <[email protected]>
@daniel-noland daniel-noland force-pushed the pr/daniel-noland/tracing-tune-up branch from cf60f5f to 0e07973 Compare October 6, 2025 23:54
@daniel-noland
Copy link
Collaborator Author

Daniel, thanks for addressing my comments. There's one change, though, that I believe is a regression. See #880 (comment)

I think this is all sorted now

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clean-up Code base clean-up, no functional change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants