Skip to content

Conversation

@nathanchance
Copy link
Member

This effectively reverts commit 8c78e38 ("boot-qemu: Always use
-no-reboot") against the rewrite that occurred in #91.

By applying this unconditionally, we potentially miss out on catching
panics. Prior to that change, if a machine panicked and rebooted, it
would keep rebooting until timeout killed QEMU, resulting in a non-zero
exit code. After that change, QEMU just exits cleanly on reboot.

cc @kees, I cannot remember what the original reason for unconditionally adding
-no-reboot was.

This effectively reverts commit 8c78e38 ("boot-qemu: Always use
-no-reboot") against the rewrite that occurred in ClangBuiltLinux#91.

By applying this unconditionally, we potentially miss out on catching
panics. Prior to that change, if a machine panicked and rebooted, it
would keep rebooting until timeout killed QEMU, resulting in a non-zero
exit code. After that change, QEMU just exits cleanly on reboot.

Signed-off-by: Nathan Chancellor <[email protected]>
@nathanchance nathanchance linked an issue Mar 16, 2023 that may be closed by this pull request
Copy link
Member

@nickdesaulniers nickdesaulniers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks! 🌥️ 🌔

@nathanchance nathanchance merged commit cb9c38a into ClangBuiltLinux:main Mar 17, 2023
@nathanchance nathanchance deleted the drop-unconditional-no-reboot branch March 17, 2023 22:03
@kees
Copy link
Contributor

kees commented Mar 17, 2023

It was exactly that: to convert panics into errors instead of timeouts (and to avoid the overhead of waiting for the timeouts)

@nickdesaulniers
Copy link
Member

@nathanchance
Copy link
Member Author

As mentioned in the CI issue, boot-utils in CI has not been updated to include this, so it is expected that we are still missing panics.

@nickdesaulniers
Copy link
Member

It was exactly that: to convert panics into errors instead of timeouts (and to avoid the overhead of waiting for the timeouts)

The issue is that setting -no-reboot does not set the return code of qemu to indicate failure.

I just hit this where we have a boot loop in ClangBuiltLinux/linux#1848, so I added

diff --git a/boot-qemu.py b/boot-qemu.py
index 24a0f0f59aa4..1052ca1ddf16 100755
--- a/boot-qemu.py
+++ b/boot-qemu.py
@@ -399,6 +399,7 @@ class ARM64QEMURunner(QEMURunner):
         self._default_kernel_path = Path('arch/arm64/boot/Image.gz')
         self._initrd_arch = 'arm64'
         self._qemu_arch = 'aarch64'
+        self._qemu_args.append('-no-reboot')
 
     def _get_cpu_val(self):
         cpu = ['max']

This results in the machine exiting cleanly rather than looping but:

$ /android1/boot-utils/boot-qemu.py -k .
...
[    0.027803][    T0] ---[ end trace 0000000000000000 ]---
[    0.028222][    T0] Kernel panic - not syncing: Oops - Undefined instruction: Fatal exception
$ echo $?
0

Based on https://unix.stackexchange.com/a/443060, mentions pvpanic, but that requires multiple configs to be enabled (PVPANIC, PVPANIC_PCI) and setting -device pvpanic-pci (I haven't found which -action value works best yet). https://blogs.oracle.com/linux/post/an-introduction-to-pvpanic . I wonder if we should consider enabling those in arch/arm64/configs/virt.config?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

-no-reboot hiding panics

3 participants