-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[lldb] Log errors to the system log if they would otherwise get dropped #111911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Log errors to the (always-on) system log if they would otherwise get dropped by LLDB_LOG_ERROR and LLDB_LOG_ERRORV.
@llvm/pr-subscribers-lldb Author: Jonas Devlieghere (JDevlieghere) ChangesLog errors to the (always-on) system log if they would otherwise get dropped by LLDB_LOG_ERROR and LLDB_LOG_ERRORV. Full diff: https://github.com/llvm/llvm-project/pull/111911.diff 3 Files Affected:
diff --git a/lldb/include/lldb/Utility/Log.h b/lldb/include/lldb/Utility/Log.h
index ac6347153a1014..f8545cd59a18ba 100644
--- a/lldb/include/lldb/Utility/Log.h
+++ b/lldb/include/lldb/Utility/Log.h
@@ -335,6 +335,15 @@ template <typename Cat> Log *GetLog(Cat mask) {
return LogChannelFor<Cat>().GetLog(Log::MaskType(mask));
}
+/// Getter and setter for the error log (see g_error_log).
+/// The error log is set to the system log in SystemInitializerFull. We can't
+/// use the system log directly because that would violate the layering between
+/// Utility and Host.
+/// @{
+void SetLLDBErrorLog(Log *log);
+Log *GetLLDBErrorLog();
+/// @}
+
} // namespace lldb_private
/// The LLDB_LOG* macros defined below are the way to emit log messages.
@@ -387,6 +396,9 @@ template <typename Cat> Log *GetLog(Cat mask) {
if (log_private && error_private) { \
log_private->FormatError(::std::move(error_private), __FILE__, __func__, \
__VA_ARGS__); \
+ } else if (::lldb_private::Log *log_error = GetLLDBErrorLog()) { \
+ log_error->FormatError(::std::move(error_private), __FILE__, __func__, \
+ __VA_ARGS__); \
} else \
::llvm::consumeError(::std::move(error_private)); \
} while (0)
@@ -401,6 +413,9 @@ template <typename Cat> Log *GetLog(Cat mask) {
if (log_private && log_private->GetVerbose() && error_private) { \
log_private->FormatError(::std::move(error_private), __FILE__, __func__, \
__VA_ARGS__); \
+ } else if (::lldb_private::Log *log_error = GetLLDBErrorLog()) { \
+ log_error->FormatError(::std::move(error_private), __FILE__, __func__, \
+ __VA_ARGS__); \
} else \
::llvm::consumeError(::std::move(error_private)); \
} while (0)
diff --git a/lldb/source/API/SystemInitializerFull.cpp b/lldb/source/API/SystemInitializerFull.cpp
index 8a992a6889a91b..31f3a9f30b81f0 100644
--- a/lldb/source/API/SystemInitializerFull.cpp
+++ b/lldb/source/API/SystemInitializerFull.cpp
@@ -84,6 +84,9 @@ llvm::Error SystemInitializerFull::Initialize() {
// Use the Debugger's LLDBAssert callback.
SetLLDBAssertCallback(Debugger::AssertCallback);
+ // Use the system log to report errors that would otherwise get dropped.
+ SetLLDBErrorLog(GetLog(SystemLog::System));
+
LLDB_LOG(GetLog(SystemLog::System), "{0}", GetVersion());
return llvm::Error::success();
diff --git a/lldb/source/Utility/Log.cpp b/lldb/source/Utility/Log.cpp
index 3798f406476370..a57d415be033ba 100644
--- a/lldb/source/Utility/Log.cpp
+++ b/lldb/source/Utility/Log.cpp
@@ -43,6 +43,10 @@ char TeeLogHandler::ID;
llvm::ManagedStatic<Log::ChannelMap> Log::g_channel_map;
+// The error log is used by LLDB_LOG_ERROR and LLDB_LOG_ERRORV. If the given
+// log channel is not enabled, error messages are logged to the error log.
+static std::atomic<Log *> g_error_log = nullptr;
+
void Log::ForEachCategory(
const Log::ChannelMap::value_type &entry,
llvm::function_ref<void(llvm::StringRef, llvm::StringRef)> lambda) {
@@ -460,3 +464,7 @@ void TeeLogHandler::Emit(llvm::StringRef message) {
m_first_log_handler->Emit(message);
m_second_log_handler->Emit(message);
}
+
+void lldb_private::SetLLDBErrorLog(Log *log) { g_error_log.exchange(log); }
+
+Log *lldb_private::GetLLDBErrorLog() { return g_error_log; }
|
I considered always logging to the system log, but copying/cloning errors is a bit tricky. I could reuse the logic that Adrian added to status, but wanted to gather your input. Regardless, I tried this out locally by debugging the |
To clone an error, do Status::FromError(error). Every call to Status::toError() clones the error (as opposed to takeError()). Generally we want to avoid cloning errors though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I have a couple of thoughts:
- the system log currently writes to stdout on windows. This was sort of okay, when we were writing there only sporadically, but I think piping all of that here would make lldb unusable. If we want to go down this path, I think we should make
Host::SystemLog
a noop on windows (until someone comes up with a better implementation) - I guess this is fine, though I am somewhat worried about the amount of output that this will generate. Like, I'm not sure if all of these errors really rise to the level that needs this kind of error reporting (it could be an "expected" error that is somehow retried/corrected at a higher level). I don't have a good intuition for this, but I know that personally, I was using LLDB_LOG_ERROR very liberally. I think I'll show more restraint after this.
- Nonetheless I think it would be good to have some way to opt out of this behavior -- if we find that some low level log statement is generating too much output. Maybe we could draw the line at the verbose log (LLDB_LOG_ERRORV)? After all, if something is a "verbose log message" then it probably doesn't contain "high-value information" that's supposed to be "used sparsely and deliberately". (Although, to be honest, even limiting this to
LLDB_LOG_ERROR
doesn't exactly sound like sparse and deliberate use to me)
lldb/include/lldb/Utility/Log.h
Outdated
@@ -387,6 +396,9 @@ template <typename Cat> Log *GetLog(Cat mask) { | |||
if (log_private && error_private) { \ | |||
log_private->FormatError(::std::move(error_private), __FILE__, __func__, \ | |||
__VA_ARGS__); \ | |||
} else if (::lldb_private::Log *log_error = GetLLDBErrorLog()) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend structuring this in a way that avoids evaluating __VA_ARGS__
more than once. You could e.g. insert a if(!log_private) log_private = GetLLDBErrorLog()
statement before this if
.
(I know that the macro gets dynamically evaluated only once, but I was recently reading about how this innocent-looking macro in the linux kernel exploded to several dozen kilobytes of goo. We're not there yet, but this should help anyone looking at the preprocessed output, and also help the compiler generate less code)
FWIW, it is possible to serialize an llvm::Error (for example, using llvm::formatv) without consuming it. (I think it did not exist back when this macro was written) It would just require some refactoring of the macro... |
Regarding limiting the output, I'm happy to draw the line at @adrian-prantl do you want the system log to include the dropped errors in |
Co-authored-by: Pavel Labath <[email protected]>
From the looks of things, these "system logs" will end up somewhere in For example, on linux, these system logs are normally only readable by root, so a normal user wouldn't even be able to see what is being logged, and the log messages would get interspersed with messages from daemons or programs like su, cron, NetworkManager, etc. (it's possible to configure this differently, but this also is something only system administrators can do). I actually don't remember seeing any non-daemon process write to these logs. I suppose the situation could be different on darwin, and maybe this just means that the linux/posix implementation of the "system log" is wrong (in this brave new world, at least), but I also don't know what would be a good implementation of this function. @DavidSpickett, do you have any thoughts on this? :) |
Dropped because of what exactly, because the user did not enable a channel that would include them? How does one enable collection of these with the current lldb?
So if we wanted to implement "here is a log you should send to an lldb developer if they ask for it", then we'd need to log to somewhere else anyway. Certainly if someone is shipping lldb within an organisation they could implement that, but I'm not convinced it's required for Linux right now. And at that point why log to the Darwin system log not some other file? At least it would have a consistent name across platforms then. It sounds like these messages are currently dropped so we could just change Darwin to push to the system log and leave the rest as is and it wouldn't be a regression. Is that correct? That's a way forward at least. (idk if you are aiming to enable this on all platforms for the Swift project's benefit) I'm also curious how useful the system log is if it contains other output. Maybe that's the reason it's useful, if it can tell you about system events that caused the problem, but having random services' output in there seems A: annoying and B: something a issue reporter might be wary of sharing. |
Let me try to provide a bit more context. Darwin has the concept of a sysdiagnose. It's essentially an archive with diagnostic files, including crashlogs and a dump of the last few minutes of the system log. On our platform, the majority of bug reports (feedback reports/radars) are accompanied by a sysdiagnose. We already rely heavily on them for crashes (crashlogs for As I mentioned in #108495, Swift was already doing something similar, and we've found this to be extremely useful. To give a concrete example, the Swift fork starts by logging the used lldb version to the system log. We've found that this was a lot more reliable than trusting the version reported by the user, if they mention a version in their bug report at all. When there's a sysdiagnose attached, we don't have to send the bug back, asking for more information. My goal for the always-on log channel (which logs to the system log) is to log high-value/low-bandwidth information. Separately, we had a few instances recently where we received bug reports that were hard to investigate and where we discovered that deep down we had created a meaningful Error (which would've told us the problem right away) but got dropped somewhere or at least not shown to the user. @adrian-prantl has put up several patches to propagate those errors. The goal of this patch is to catch those errors so we can quickly spot them in the sysdiagnose. To be clear, the goal is not to rely on that, but to save us from having to spend two days debugging LLDB when we could've spotted the issue from a dropped error right away, and use that time instead to improve LLDB and propagate the error to the user. Although this is all motivated by sysdiagnoses, I didn't want to do something too bespoke for our platform and want to be able to reuse the existing logging infrastructure. I figured this might be useful on Linux as well, but as I'm hearing more about how the system log is used there, I think it's much less appealing. I'm starting to wonder if we should just make the system log a NOOP on Linux too like we did on Windows. |
No, I expect that to be too much traffic in the beginning at least. |
Oh so this is why the tests on Windows suddenly printed version numbers then :)
The pitch sounds great I'm all in but yes I don't think there's a single place on Linux that fits right now. System log being a NOOP isn't a regression for Linux either. If/when Swift on Linux or some other LLDB distributor wants to get the same benefits as Mac, they can suggest a solution. If this uses the existing logging infrastructure, I presume it is possible to give a user a command to enable lldb's system log and re-run the reproducer? It would be the same as sending any other channel to a file. |
Yeah, I have a feeling that Linux is too heterogeneous for something like this to make sense for everyone. Although, I'm not sure if it makes sense on Darwin outside of your use case. Like, if I wanted to (which I don't, but I understand some people do) ship a custom lldb to mac (say, as a part of some IDE), would I be able to integrate these "system" logs into my own (deliberately underspecified) bug reporting system? This sounds to me like one of those things where a "vendor" might need to make a choice as to what "system log" means for them. I'm not saying we have to design that right now, but we might want to leave the door open for that. And that design might consist of overriding Host(Info)::SystemLog... Bottom line: I also think we should also make noop on linux (or non-mac, basically). |
As discussed in #111911, we have consensus that as it stands, the system log is only meaningful on Darwin and that by default it should be a NOOP on other platforms.
As discussed in llvm#111911, we have consensus that as it stands, the system log is only meaningful on Darwin and that by default it should be a NOOP on other platforms.
Now that the system log is a Darwin-only concept and the review comments have been addressed, I assume nobody objects to merging this? |
lldb/source/Utility/Log.cpp
Outdated
@@ -43,6 +43,10 @@ char TeeLogHandler::ID; | |||
|
|||
llvm::ManagedStatic<Log::ChannelMap> Log::g_channel_map; | |||
|
|||
// The error log is used by LLDB_LOG_ERROR and LLDB_LOG_ERRORV. If the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The error log is used by LLDB_LOG_ERROR and LLDB_LOG_ERRORV. If the given | |
// The error log is used by LLDB_LOG_ERROR. If the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed in the latest version of this PR :-)
As discussed in llvm#111911, we have consensus that as it stands, the system log is only meaningful on Darwin and that by default it should be a NOOP on other platforms. (cherry picked from commit cc13d4f)
As discussed in llvm#111911, we have consensus that as it stands, the system log is only meaningful on Darwin and that by default it should be a NOOP on other platforms. (cherry picked from commit cc13d4f)
Log errors to the (always-on) system log if they would otherwise get dropped by LLDB_LOG_ERROR and LLDB_LOG_ERRORV.