Skip to content

aarch64: implement signal delivery #1217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 10, 2024

Conversation

shrik3
Copy link
Collaborator

@shrik3 shrik3 commented May 7, 2024

[updated!] There are some todos, see commit messages. Anyways for the scope of this PR I'll stop here.

we have the basic stuffs running without issues now. try container ubuntu bash
image

@shrik3
Copy link
Collaborator Author

shrik3 commented May 7, 2024 via email

@shrik3 shrik3 mentioned this pull request May 7, 2024
@chl337
Copy link
Collaborator

chl337 commented May 7, 2024

This is the result of the test directly on the machine:
scrshot_2024-05-07T09:52:46

Is this what to be expected?


On Quark:
scrshot_2024-05-07T09:58:45

@chl337
Copy link
Collaborator

chl337 commented May 7, 2024

The struct ucontext is different between x86_64 and arm64 in Linux? Could this cause cause confusion to glibc?

This experimental (copy-paste from linux) implementation breaks SignalReturn().

@shrik3
Copy link
Collaborator Author

shrik3 commented May 7, 2024

This is the result of the test directly on the machine: scrshot_2024-05-07T09:52:46

Is this what to be expected?

On Quark: scrshot_2024-05-07T09:58:45

Is this what to be expected?

Yes. in the test I have no sanity check whether the vdso __kernel_rt_sigreturn is a null pointer. If you pass 0 to sigaction and set restorer to true then you will get funky things.

@shrik3
Copy link
Collaborator Author

shrik3 commented May 7, 2024

The struct ucontext is different between x86_64 and arm64 in Linux? Could this cause cause confusion to glibc?

This experimental (copy-paste from linux) implementation breaks SignalReturn().

good point. let me see

@shrik3
Copy link
Collaborator Author

shrik3 commented May 7, 2024

The struct ucontext is different between x86_64 and arm64 in Linux? Could this cause cause confusion to glibc?

This experimental (copy-paste from linux) implementation breaks SignalReturn().

can confirm, I think the problem is with stack unwinding.

@chl337
Copy link
Collaborator

chl337 commented May 7, 2024

known issue:

1. could panic if the user does not set a working trampoline for the sigreturn. Interestingly the libc `signal()` does not use vdso `__kernel_rt_sigreturn` actively. Perhaps we are missing some stuffs.

How can this behavior be reproduced?

@shrik3
Copy link
Collaborator Author

shrik3 commented May 7, 2024

known issue:

1. could panic if the user does not set a working trampoline for the sigreturn. Interestingly the libc `signal()` does not use vdso `__kernel_rt_sigreturn` actively. Perhaps we are missing some stuffs.

How can this behavior be reproduced?

see my example, replace the sigaction stuffs with one signal(signo, handler). The return trampoline will not be correctly called.

@chl337
Copy link
Collaborator

chl337 commented May 7, 2024

known issue:

1. could panic if the user does not set a working trampoline for the sigreturn. Interestingly the libc `signal()` does not use vdso `__kernel_rt_sigreturn` actively. Perhaps we are missing some stuffs.

How can this behavior be reproduced?

see my example, replace the sigaction stuffs with one signal(signo, handler). The return trampoline will not be correctly called.

Thanks and sorry for the noise, I am a totally noob on the subject.

@shrik3
Copy link
Collaborator Author

shrik3 commented May 7, 2024

The struct ucontext is different between x86_64 and arm64 in Linux? Could this cause cause confusion to glibc?
This experimental (copy-paste from linux) implementation breaks SignalReturn().

can confirm, I think the problem is with stack unwinding.

fixed. The problem is with UContext Default constructor. (it has infinite recursion)

I don't know exactly why but patterns like this should be avoided.

+impl Default for UContext {
+    fn default() -> Self {
+        Self {
+            __pad: [0; (1024/8) - core::mem::size_of::<u64>()],
+            ..Default::default()                   <= this causes recursion into Self::default
+        }
+    }
+}

@chl337
Copy link
Collaborator

chl337 commented May 7, 2024

The struct ucontext is different between x86_64 and arm64 in Linux? Could this cause cause confusion to glibc?
This experimental (copy-paste from linux) implementation breaks SignalReturn().

can confirm, I think the problem is with stack unwinding.

fixed. The problem is with UContext Default constructor. (it has infinite recursion)

I don't know exactly why but patterns like this should be avoided.

+impl Default for UContext {
+    fn default() -> Self {
+        Self {
+            __pad: [0; (1024/8) - core::mem::size_of::<u64>()],
+            ..Default::default()                   <= this causes recursion into Self::default
+        }
+    }
+}

Oh, yes I see.
rust-lang/rust-clippy#8609

@shrik3 shrik3 force-pushed the armdev_signals+pr+wip branch from fec1ad2 to aedae1a Compare May 7, 2024 22:14
@shrik3 shrik3 changed the title [wip] aarch64: signals aarch64: implement signal delivery May 7, 2024
@shrik3 shrik3 requested review from CharlyYu and chl337 May 7, 2024 22:19
@shrik3 shrik3 force-pushed the armdev_signals+pr+wip branch from aedae1a to 1fe2ef5 Compare May 7, 2024 22:21
shrik3 and others added 2 commits May 8, 2024 05:30
we need a convient method to get the userspace addr of the loaded vdso
page. For example, vdso is required for aarch64 to set up the return
trampoline for sigreturn. This is a mockup and may need refactoring.

Signed-off-by: Tianhao Wang <[email protected]>
Yeah, that's different than x86.

Co-authored-by: ch1337 <[email protected]>
Signed-off-by: Tianhao Wang <[email protected]>
@shrik3 shrik3 force-pushed the armdev_signals+pr+wip branch from 1fe2ef5 to c1e702a Compare May 8, 2024 03:30
@CharlyYu
Copy link
Collaborator

CharlyYu commented May 8, 2024

I still get an error when run ubuntu bash
1715185426552

and this is the log:
image

The code is in the branch of armdev_signals+pr+wip and I get qkernel_d.bin and quark_d by make debug.

@CharlyYu
Copy link
Collaborator

CharlyYu commented May 8, 2024

And It seems the signal is handled repeatedly.

image

@chl337
Copy link
Collaborator

chl337 commented May 8, 2024

And It seems the signal is handled repeatedly.

image

Can you show the last instructions before the first SIGSEGV (signal 11)?
Also the your git log please. I can't reproduce it.

@CharlyYu
Copy link
Collaborator

CharlyYu commented May 8, 2024

git log:
1715186628608

@CharlyYu
Copy link
Collaborator

CharlyYu commented May 8, 2024

Can you show the last instructions before the first SIGSEGV (signal 11)?

1715186776994

@CharlyYu
Copy link
Collaborator

CharlyYu commented May 8, 2024

It seems a page fault with an inappropriate address.

@chl337
Copy link
Collaborator

chl337 commented May 8, 2024

git log: 1715186628608

Can you also apply #1220 and test again? Also do a make clean first.

@CharlyYu
Copy link
Collaborator

CharlyYu commented May 8, 2024

Same error, is it because I didn't cherry-pick codes in #1212 or other prs? can you push a branch of your codes that can run the ubuntu bash?

@chl337
Copy link
Collaborator

chl337 commented May 8, 2024

Same error, is it because I didn't cherry-pick codes in #1212 or other prs?
My tree: #1212 + #1215 + #1217 + #1220

can you push a branch of your codes that can run the ubuntu bash?

I can do it first in ~1.5 hours, sorry.

@shrik3
Copy link
Collaborator Author

shrik3 commented May 8, 2024

@CharlyYu sorry for the late reply.

https://github.com/shrik3/Quark/tree/armdev_dev_base -> for you only

image

please take a look at the commit log and throw away my local changes (prefixed with local) that you don't want.

#1220 is not necessary for ubuntu bash to run. Also note that the quark is broken on the newest ubuntu container image (both on main and us). Try using something older. Also you could try busybox sh

Note on my local commits: there are PAN related code that are only relevant to our testing machine. Compilation could fail in your env.

@chl337
Copy link
Collaborator

chl337 commented May 9, 2024

It seems a page fault with an inappropriate address.

@CharlyYu That seems to me as a context switch issue... If not already solved, try to print the state during switches (use #1203 for pretty print)

@chl337
Copy link
Collaborator

chl337 commented May 10, 2024

On trying Redis I get the following error:
bad g in signal handler
which for me is pretty cryptic. The only reference I found online is on Go runtime. AFAIK Redis is supported on main branch (cannot tested myself though).

Reproduce:
docker run --rm -it -p 6379:6379 --runtime=quark_dlocal --name redis_test redis

@shrik3
Copy link
Collaborator Author

shrik3 commented May 10, 2024

On trying Redis I get the following error: bad g in signal handler which for me is pretty cryptic. The only reference I found online is on Go runtime. AFAIK Redis is supported on main branch (cannot tested myself though).

Reproduce: docker run --rm -it -p 6379:6379 --runtime=quark_dlocal --name redis_test redis

this: https://go.googlesource.com/go/+/go1.15.3/src/runtime/signal_unix.go#378

please grep sys_rt_sigaction in the log. What's sigaction (the restorer specifically)?

Also FWIW I don't want to look into go runtime perks in the scope of this PR. Could you take this to an issue to keep it tracked?

@chl337
Copy link
Collaborator

chl337 commented May 10, 2024 via email

NOTE: For aarch64 vdso is required for sigreturn to work.

WIP/TODOs:
- save/restore FP state
- restart syscalls (and set orig_x0) when necessary
- handle multiple signal frames

Signed-off-by: Tianhao Wang <[email protected]>
@shrik3 shrik3 force-pushed the armdev_signals+pr+wip branch from c1e702a to ced4444 Compare May 10, 2024 12:39
@shrik3 shrik3 merged commit f0997ea into QuarkContainer:armdev May 10, 2024
@shrik3 shrik3 deleted the armdev_signals+pr+wip branch May 14, 2024 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants