Skip to content

[lldb] Log errors to the system log if they would otherwise get dropped #111911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 22, 2024

Conversation

JDevlieghere
Copy link
Member

Log errors to the (always-on) system log if they would otherwise get dropped by LLDB_LOG_ERROR and LLDB_LOG_ERRORV.

Log errors to the (always-on) system log if they would otherwise get
dropped by LLDB_LOG_ERROR and LLDB_LOG_ERRORV.
@llvmbot
Copy link
Member

llvmbot commented Oct 10, 2024

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

Log errors to the (always-on) system log if they would otherwise get dropped by LLDB_LOG_ERROR and LLDB_LOG_ERRORV.


Full diff: https://github.com/llvm/llvm-project/pull/111911.diff

3 Files Affected:

  • (modified) lldb/include/lldb/Utility/Log.h (+15)
  • (modified) lldb/source/API/SystemInitializerFull.cpp (+3)
  • (modified) lldb/source/Utility/Log.cpp (+8)
diff --git a/lldb/include/lldb/Utility/Log.h b/lldb/include/lldb/Utility/Log.h
index ac6347153a1014..f8545cd59a18ba 100644
--- a/lldb/include/lldb/Utility/Log.h
+++ b/lldb/include/lldb/Utility/Log.h
@@ -335,6 +335,15 @@ template <typename Cat> Log *GetLog(Cat mask) {
   return LogChannelFor<Cat>().GetLog(Log::MaskType(mask));
 }
 
+/// Getter and setter for the error log (see g_error_log).
+/// The error log is set to the system log in SystemInitializerFull. We can't
+/// use the system log directly because that would violate the layering between
+/// Utility and Host.
+/// @{
+void SetLLDBErrorLog(Log *log);
+Log *GetLLDBErrorLog();
+/// @}
+
 } // namespace lldb_private
 
 /// The LLDB_LOG* macros defined below are the way to emit log messages.
@@ -387,6 +396,9 @@ template <typename Cat> Log *GetLog(Cat mask) {
     if (log_private && error_private) {                                        \
       log_private->FormatError(::std::move(error_private), __FILE__, __func__, \
                                __VA_ARGS__);                                   \
+    } else if (::lldb_private::Log *log_error = GetLLDBErrorLog()) {           \
+      log_error->FormatError(::std::move(error_private), __FILE__, __func__,   \
+                             __VA_ARGS__);                                     \
     } else                                                                     \
       ::llvm::consumeError(::std::move(error_private));                        \
   } while (0)
@@ -401,6 +413,9 @@ template <typename Cat> Log *GetLog(Cat mask) {
     if (log_private && log_private->GetVerbose() && error_private) {           \
       log_private->FormatError(::std::move(error_private), __FILE__, __func__, \
                                __VA_ARGS__);                                   \
+    } else if (::lldb_private::Log *log_error = GetLLDBErrorLog()) {           \
+      log_error->FormatError(::std::move(error_private), __FILE__, __func__,   \
+                             __VA_ARGS__);                                     \
     } else                                                                     \
       ::llvm::consumeError(::std::move(error_private));                        \
   } while (0)
diff --git a/lldb/source/API/SystemInitializerFull.cpp b/lldb/source/API/SystemInitializerFull.cpp
index 8a992a6889a91b..31f3a9f30b81f0 100644
--- a/lldb/source/API/SystemInitializerFull.cpp
+++ b/lldb/source/API/SystemInitializerFull.cpp
@@ -84,6 +84,9 @@ llvm::Error SystemInitializerFull::Initialize() {
   // Use the Debugger's LLDBAssert callback.
   SetLLDBAssertCallback(Debugger::AssertCallback);
 
+  // Use the system log to report errors that would otherwise get dropped.
+  SetLLDBErrorLog(GetLog(SystemLog::System));
+
   LLDB_LOG(GetLog(SystemLog::System), "{0}", GetVersion());
 
   return llvm::Error::success();
diff --git a/lldb/source/Utility/Log.cpp b/lldb/source/Utility/Log.cpp
index 3798f406476370..a57d415be033ba 100644
--- a/lldb/source/Utility/Log.cpp
+++ b/lldb/source/Utility/Log.cpp
@@ -43,6 +43,10 @@ char TeeLogHandler::ID;
 
 llvm::ManagedStatic<Log::ChannelMap> Log::g_channel_map;
 
+// The error log is used by LLDB_LOG_ERROR and LLDB_LOG_ERRORV. If the given
+// log channel is not enabled, error messages are logged to the error log.
+static std::atomic<Log *> g_error_log = nullptr;
+
 void Log::ForEachCategory(
     const Log::ChannelMap::value_type &entry,
     llvm::function_ref<void(llvm::StringRef, llvm::StringRef)> lambda) {
@@ -460,3 +464,7 @@ void TeeLogHandler::Emit(llvm::StringRef message) {
   m_first_log_handler->Emit(message);
   m_second_log_handler->Emit(message);
 }
+
+void lldb_private::SetLLDBErrorLog(Log *log) { g_error_log.exchange(log); }
+
+Log *lldb_private::GetLLDBErrorLog() { return g_error_log; }

@JDevlieghere
Copy link
Member Author

JDevlieghere commented Oct 10, 2024

I considered always logging to the system log, but copying/cloning errors is a bit tricky. I could reuse the logic that Adrian added to status, but wanted to gather your input. Regardless, I tried this out locally by debugging the count tool and this already reported a bunch of errors that would have otherwise gone unnoticed. I haven't looked at them in detail but these were all things I've never seen before.

@adrian-prantl
Copy link
Collaborator

To clone an error, do Status::FromError(error). Every call to Status::toError() clones the error (as opposed to takeError()). Generally we want to avoid cloning errors though.

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I have a couple of thoughts:

  • the system log currently writes to stdout on windows. This was sort of okay, when we were writing there only sporadically, but I think piping all of that here would make lldb unusable. If we want to go down this path, I think we should make Host::SystemLog a noop on windows (until someone comes up with a better implementation)
  • I guess this is fine, though I am somewhat worried about the amount of output that this will generate. Like, I'm not sure if all of these errors really rise to the level that needs this kind of error reporting (it could be an "expected" error that is somehow retried/corrected at a higher level). I don't have a good intuition for this, but I know that personally, I was using LLDB_LOG_ERROR very liberally. I think I'll show more restraint after this.
  • Nonetheless I think it would be good to have some way to opt out of this behavior -- if we find that some low level log statement is generating too much output. Maybe we could draw the line at the verbose log (LLDB_LOG_ERRORV)? After all, if something is a "verbose log message" then it probably doesn't contain "high-value information" that's supposed to be "used sparsely and deliberately". (Although, to be honest, even limiting this to LLDB_LOG_ERROR doesn't exactly sound like sparse and deliberate use to me)

@@ -387,6 +396,9 @@ template <typename Cat> Log *GetLog(Cat mask) {
if (log_private && error_private) { \
log_private->FormatError(::std::move(error_private), __FILE__, __func__, \
__VA_ARGS__); \
} else if (::lldb_private::Log *log_error = GetLLDBErrorLog()) { \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend structuring this in a way that avoids evaluating __VA_ARGS__ more than once. You could e.g. insert a if(!log_private) log_private = GetLLDBErrorLog() statement before this if.

(I know that the macro gets dynamically evaluated only once, but I was recently reading about how this innocent-looking macro in the linux kernel exploded to several dozen kilobytes of goo. We're not there yet, but this should help anyone looking at the preprocessed output, and also help the compiler generate less code)

@labath
Copy link
Collaborator

labath commented Oct 11, 2024

I considered always logging to the system log, but copying/cloning errors is a bit tricky. I could reuse the logic that Adrian added to status, but wanted to gather your input. Regardless, I tried this out locally by debugging the count tool and this already reported a bunch of errors that would have otherwise gone unnoticed. I haven't looked at them in detail but these were all things I've never seen before.

FWIW, it is possible to serialize an llvm::Error (for example, using llvm::formatv) without consuming it. (I think it did not exist back when this macro was written) It would just require some refactoring of the macro...

@JDevlieghere
Copy link
Member Author

So, I have a couple of thoughts:

  • the system log currently writes to stdout on windows. This was sort of okay, when we were writing there only sporadically, but I think piping all of that here would make lldb unusable. If we want to go down this path, I think we should make Host::SystemLog a noop on windows (until someone comes up with a better implementation)

#112052

Regarding limiting the output, I'm happy to draw the line at LLDB_LOG_ERRORV. The system log is pretty cheap on our platform and I assumed it's the same on Linux, so personally I would hope it doesn't deter anyone from using LLDB_LOG_ERROR.

@adrian-prantl do you want the system log to include the dropped errors in Status. If so we'll need to promote that to use LLDB_LOG_ERROR instead.

@labath
Copy link
Collaborator

labath commented Oct 14, 2024

Regarding limiting the output, I'm happy to draw the line at LLDB_LOG_ERRORV. The system log is pretty cheap on our platform and I assumed it's the same on Linux, so personally I would hope it doesn't deter anyone from using LLDB_LOG_ERROR.

From the looks of things, these "system logs" will end up somewhere in /var/log (depending on theexact system logger configuration and stuff) on linux, but I associate these logs mostly with daemon processes which don't really have a way to notify the user (because there isn't one) of any problems they encounter. Lldb has ample opportunity to do that, so using this as a sink for messages we haven't bothered to propagate to the user doesn't seem completely appropriate to me (and its not what I imagined when I saw the first patch in this series).

For example, on linux, these system logs are normally only readable by root, so a normal user wouldn't even be able to see what is being logged, and the log messages would get interspersed with messages from daemons or programs like su, cron, NetworkManager, etc. (it's possible to configure this differently, but this also is something only system administrators can do). I actually don't remember seeing any non-daemon process write to these logs. I suppose the situation could be different on darwin, and maybe this just means that the linux/posix implementation of the "system log" is wrong (in this brave new world, at least), but I also don't know what would be a good implementation of this function.

@DavidSpickett, do you have any thoughts on this? :)

@DavidSpickett
Copy link
Collaborator

Log errors to the system log if they would otherwise get dropped

Dropped because of what exactly, because the user did not enable a channel that would include them? How does one enable collection of these with the current lldb?

@DavidSpickett, do you have any thoughts on this? :)

For example, on linux, these system logs are normally only readable by root, so a normal user wouldn't even be able to see what is being logged

So if we wanted to implement "here is a log you should send to an lldb developer if they ask for it", then we'd need to log to somewhere else anyway.

Certainly if someone is shipping lldb within an organisation they could implement that, but I'm not convinced it's required for Linux right now. And at that point why log to the Darwin system log not some other file? At least it would have a consistent name across platforms then.

It sounds like these messages are currently dropped so we could just change Darwin to push to the system log and leave the rest as is and it wouldn't be a regression. Is that correct? That's a way forward at least.

(idk if you are aiming to enable this on all platforms for the Swift project's benefit)

I'm also curious how useful the system log is if it contains other output. Maybe that's the reason it's useful, if it can tell you about system events that caused the problem, but having random services' output in there seems A: annoying and B: something a issue reporter might be wary of sharing.

@JDevlieghere
Copy link
Member Author

JDevlieghere commented Oct 14, 2024

Let me try to provide a bit more context. Darwin has the concept of a sysdiagnose. It's essentially an archive with diagnostic files, including crashlogs and a dump of the last few minutes of the system log. On our platform, the majority of bug reports (feedback reports/radars) are accompanied by a sysdiagnose. We already rely heavily on them for crashes (crashlogs for lldb and debugserver) and logging in debugserver (which logs to the system log). I want to start using this for logging from lldb as well.

As I mentioned in #108495, Swift was already doing something similar, and we've found this to be extremely useful. To give a concrete example, the Swift fork starts by logging the used lldb version to the system log. We've found that this was a lot more reliable than trusting the version reported by the user, if they mention a version in their bug report at all. When there's a sysdiagnose attached, we don't have to send the bug back, asking for more information. My goal for the always-on log channel (which logs to the system log) is to log high-value/low-bandwidth information.

Separately, we had a few instances recently where we received bug reports that were hard to investigate and where we discovered that deep down we had created a meaningful Error (which would've told us the problem right away) but got dropped somewhere or at least not shown to the user. @adrian-prantl has put up several patches to propagate those errors. The goal of this patch is to catch those errors so we can quickly spot them in the sysdiagnose. To be clear, the goal is not to rely on that, but to save us from having to spend two days debugging LLDB when we could've spotted the issue from a dropped error right away, and use that time instead to improve LLDB and propagate the error to the user.

Although this is all motivated by sysdiagnoses, I didn't want to do something too bespoke for our platform and want to be able to reuse the existing logging infrastructure. I figured this might be useful on Linux as well, but as I'm hearing more about how the system log is used there, I think it's much less appealing. I'm starting to wonder if we should just make the system log a NOOP on Linux too like we did on Windows.

@adrian-prantl
Copy link
Collaborator

@adrian-prantl do you want the system log to include the dropped errors in Status. If so we'll need to promote that to use LLDB_LOG_ERROR instead.

No, I expect that to be too much traffic in the beginning at least.

@DavidSpickett
Copy link
Collaborator

the Swift fork starts by logging the used lldb version to the system log.

Oh so this is why the tests on Windows suddenly printed version numbers then :)

I'm starting to wonder if we should just make the system log a NOOP on Linux too like we did on Windows.

The pitch sounds great I'm all in but yes I don't think there's a single place on Linux that fits right now. System log being a NOOP isn't a regression for Linux either. If/when Swift on Linux or some other LLDB distributor wants to get the same benefits as Mac, they can suggest a solution.

If this uses the existing logging infrastructure, I presume it is possible to give a user a command to enable lldb's system log and re-run the reproducer? It would be the same as sending any other channel to a file.

@labath
Copy link
Collaborator

labath commented Oct 15, 2024

Yeah, I have a feeling that Linux is too heterogeneous for something like this to make sense for everyone. Although, I'm not sure if it makes sense on Darwin outside of your use case. Like, if I wanted to (which I don't, but I understand some people do) ship a custom lldb to mac (say, as a part of some IDE), would I be able to integrate these "system" logs into my own (deliberately underspecified) bug reporting system? This sounds to me like one of those things where a "vendor" might need to make a choice as to what "system log" means for them. I'm not saying we have to design that right now, but we might want to leave the door open for that. And that design might consist of overriding Host(Info)::SystemLog...

Bottom line: I also think we should also make noop on linux (or non-mac, basically).

JDevlieghere added a commit that referenced this pull request Oct 15, 2024
As discussed in #111911, we have consensus that as it stands, the system
log is only meaningful on Darwin and that by default it should be a NOOP
on other platforms.
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
As discussed in llvm#111911, we have consensus that as it stands, the system
log is only meaningful on Darwin and that by default it should be a NOOP
on other platforms.
@JDevlieghere
Copy link
Member Author

Now that the system log is a Darwin-only concept and the review comments have been addressed, I assume nobody objects to merging this?

@@ -43,6 +43,10 @@ char TeeLogHandler::ID;

llvm::ManagedStatic<Log::ChannelMap> Log::g_channel_map;

// The error log is used by LLDB_LOG_ERROR and LLDB_LOG_ERRORV. If the given
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The error log is used by LLDB_LOG_ERROR and LLDB_LOG_ERRORV. If the given
// The error log is used by LLDB_LOG_ERROR. If the given

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed in the latest version of this PR :-)

@JDevlieghere JDevlieghere merged commit cd4b33c into llvm:main Oct 22, 2024
7 checks passed
@JDevlieghere JDevlieghere deleted the lldb-error-system-log branch October 22, 2024 16:00
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Apr 10, 2025
As discussed in llvm#111911, we have consensus that as it stands, the system
log is only meaningful on Darwin and that by default it should be a NOOP
on other platforms.

(cherry picked from commit cc13d4f)
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Apr 11, 2025
As discussed in llvm#111911, we have consensus that as it stands, the system
log is only meaningful on Darwin and that by default it should be a NOOP
on other platforms.

(cherry picked from commit cc13d4f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants