-
Notifications
You must be signed in to change notification settings - Fork 5k
Alpine Linux ARM64 bring up #7231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Copying relevant comments:
|
@am11 re: libunwind on Alpine: I replicated the build script as good as I could, including the parameters passed to Do you have any idea as to why Alpine didn't enable builds of libunwind for aarch64? |
@qmfrederik, perhaps this is because nobody has asked for AARCH64 package. I have only tested building libsass and some other native libraries on Alpine (x86_64) in past. One approach is to send a message on their mailing list or IRC channel of aports to add the package or (like in this case) to support particular architecture. We can also send a PR to aports repo, and our changes will be verified with their dedicated build systems (which runs few times a day to rebuild all packages; and is different/detailed than CI checks per PR).
This looks good. 👍 |
@am11 Nope, it looks like it's
|
I believe the symbol is defined in the libunwind-aarch64.so. On Amd64, we link to libunwind.so, libunwind-x86_64.so and libunwind-generic.so. On aarch 64, the CMakeLists was missing the architecture specific file. |
@qmfrederik, can you try making this kind of patch for libunwind: https://github.com/alpinelinux/aports/blob/f82ed56f8dd3e5a8ab11a39e75d7c37922cd3691/community/chromium/no-getcontext.patch Basically ifdef it out for GLIBC. Also you may want to use specific version: http://download.savannah.gnu.org/releases/libunwind/libunwind-1.2-rc1.tar.gz, same one that is used by |
@janvorli You are correct:
It looks like somehow the CoreCLR build managed to reference |
@qmfrederik it needs to reference all of the three. The libunwind-aarch64.so is missing, as I've said before. |
@janvorli I added the reference to Just to be sure - since you say "it needs to reference all of the three", does this mean that coreclr will also link against |
It won't fail, the libunwind-aarch64.so will satisfy the symbol need for the libunwind.so. Actually, the libunwind-generic.so seems to not to be needed, but it doesn't hurt.
|
@janvorli I'm relatively sure the issue is in libunwind defines I'm assuming Question: I can try to copy the definition of |
@qmfrederik ok, thanks for the explanation. As for copying the code it won't be as simple as that. The aarch64 uses different registers and instructions. You may get some idea on how to do that in the LLVM libunwind below: |
@janvorli Thanks for the pointer. Is llvm libunwind supposed to be API-compatible with libunwind? I had a cursory look at the header, and the declarations seem to be identical. Yes, it seems the file you referenced basically implements I have to admit I've taken the easy path for now, and just stubbed the function as Good news! coreclr now compiles successfully on Alpine aarch64. Are there any coreclr tests I can run to get a grasp of the stability of this version of coreclr? |
You can run the PAL tests. After building coreclr (without skiptests argument), run |
Thanks, I"ll run the PAL tests and post back the results. Meanwhile, I also attempted to build the native part of corefx, but some standard C++ includes like
I can
Which C++ package did you use for Alpine x64? |
This is a dockerfile I was using: FROM alpine:latest
RUN apk update
RUN apk add bash gcc clang clang-dev cmake make build-base icu-dev python gettext-dev zlib-dev curl-dev krb5-dev linux-headers git llvm autoconf libtool
# Other sources
RUN apk -X https://dl-cdn.alpinelinux.org/alpine/edge/main add --no-cache libunwind-dev util-linux-dev
RUN apk -X https://dl-cdn.alpinelinux.org/alpine/edge/testing add --no-cache lldb-dev lttng-ust-dev
RUN apk add ca-certificates && update-ca-certificates && apk add openssl I remember having issue with the limits as well and I think one of these packages have fixed it |
I tried adding those packages but that didn't work; what seems to work is:
|
@janvorli the native corefx components build fine, so that's one more step ahead - both coreclr & corefx are now compiling on Alpine ARM64. Regarding the coreclr pal tests, they seem to hang. Running src/pal/tests/palsuite/runpaltests.sh /home/coreclr/bin/obj/Linux.arm64.Debug gives the following output:
(note the single dot). I've left it running for about 5 minutes, but no additional dot appeared. Looking at the source, it seems there should be a dot per test, so I'm assuming this is not normal. The PAL test logs in the |
Hmm, when I run it on my RPI3 with ARM32 Linux, the dots appear with a cadence of about 4 per second, so there is obviously something wrong. |
Thx, I'll try it later today. Are there steps I can take to enabke some kind of verbose logging, so we can get an idea of where it is stuck? |
If you look at https://github.com/dotnet/coreclr/blob/master/src/pal/tests/palsuite/runpaltests.sh, you'd see that we print a dot before executing each test. You can instrument that file e.g. by adding |
@janvorli I cherry-picked 2f441cfcb9ceb01d877ba4f66b9a8f451c959987 onto my branch, rebuilt CoreCLR and ran the The result is the same - the process hangs (waited for ~ 1 minute) with no console output. Is there anything we can do to find out what |
@qmfrederik can you run it under lldb? it has full debug info if you built it as debug, so you can run it and when it hangs, break in and see what the call stack looks like. |
Looks like the hang is in I tried with the
|
What is the call stack at that point ( |
|
This is really strange. There is no loop at that place, so it looks like it has hung in the call to __sync_val_compare_and_swap. Can you try to step in the debugger after you break in to see if it is the case? |
@janvorli got sidetracked because I noticed that call was within an
Thought that was worth sharing, I'll now step in the debugger for the tests compiled in debug mode. |
@janvorli so here's the
|
@qmfrederik I guess that's the case then. It is pretty strange. Could you please do the following two things:
|
@qmfrederik Ok, let's set managed breakpoint on the ParamsArray..cctor again and then do some stepping. For the beginning, let's just step over the calls (using the |
From the "there has to be a better way to do this" departement, here's the output: [removed] |
@qmfrederik I don't know why the lldb stepped into the calls instead of over them. I didn't expect to get the full trace, just the return values of the functions called. Which you have not printed. |
@janvorli Here you go:
|
By the way, I was inspecting the values of arg0, arg1, arg2, and it appears we have the same NotInit behavior for string.Empty
|
@qmfrederik the addresses that are in x14 should match the addresses where the oneArgArray, twoArgArray and threeArgArray are located. Could you please also set a breakpoint to the regular constructor of the ParamsArray ( |
@janvorli Here's the
The regular constructor was hit before the static constructor, not sure if that's expected.
|
@qmfrederik it is expected that the regular constructor is ran before the static one. The static one is actually called from inside of the call to 0xffff3d45d830. That function is the JIT_GetSharedGCStaticBase_Portable and its goal is to return the the base address of the statics and ensure that the static constructor for a class described by the parameter in w1 (0x28b in our case) was called. If the values of x0 don't match, that's the problem and we need to figure out why. If they match, and the value dumped from the memory is null, then there is some problem with the JIT_WriteBarrier or something just overwrites the value after the JIT_WriteBarrier has written it there in the static constructor. |
@janvorli The values of
|
That's weird. The value you dumped from the memory at $x0+0xaa8 looks like a reasonable address. If you still have the session open, could you do |
|
@qmfrederik so that's the correct array that should be there. I wonder why the args member is set to null then. The fact that the DumpVC shows the static members as null can be an issue of the SOS and not the reality. |
Btw, I am going to be OOF from tomorrow till Sunday and I won't have access to the internet. |
Looking at the .ctor code again, the only way how the args member can end up being null is that the call to 0xffff3d451f38 would fail to write x15 to the memory at x14. |
@janvorli I'll try that tomorrow, so you'll have the results by the time you get back. So after that call, x14 and x15 should have the same values, right? I'll dump them too so we can check. If there are ither things I can check, let me know. In that code, is x29 the ParamsArray and 0x18 the offset of args? Enjoy your time off! |
@qmfrederik I guess you meant that the value in memory at address x14 should have the same value as the x15, right? That's true. |
@janvorli Thanks for the explanation! Here's the disass of the function which should update the value of
where
Once So in the end the following code is called:
which I thought was: "copy the memory at x15 to x14, and increase x14 by 8" but:
which is something I currently don't understand. By looking at the code, though, I noticed changes to the I'm guessing I must have missed something, so waiting for your feedback. PS: cheat sheet for myself:
|
Never mind, my bad. The value of x15 is written to the memory at address x14, so that appears to be correct. |
I think I've been able to dump two copies of the ParamsArray, one on which the initializer runs and one which is in scope for the FormatHelper function (since it's a struct, it is passed by value, right)?
It looks like arg0, arg1 and arg2 are copied over correctly, but args is not. |
At > 200 comments, this thread becomes a bit long to parse so here's a write-up: SummaryWork done so far
Current Issue - SummaryIssueA Hello World application (
Managed stack trace
The Steps to reproduceUse Run Hypothesis
|
@janvorli By setting the breakpoint in I noticed a lot of the code was in I tried to port some of the fixes applied to the The changes are in https://github.com/qmfrederik/coreclr/commit/6e245a6925f2ae2e954c0381f5eeaca0114670c3, I'll try to make PRs which match the original PRs |
@qmfrederik great job spotting and fixing the issues in the asm helpers. This is a great milestone! |
@janvorli Thanks for your patience and assistance as well, couldn't have done it without! So, this leaves me thinking "what's next". In the end, we have a fairly large .NET Core application that we'd like to get running on ARM64 and we're not there yet :) Things I can think of as a next step:
What do you think? Are there other things we should test/try at this point? |
@qmfrederik I agree that we should close this issue as the initial phase of the bringup is complete. Before moving to coreclr tests, we need to make the exception handling work. static void M1()
{
throw new Exception("e");
}
static void Main()
{
try
{
M1();
}
catch (Exception ex)
{
Console.WriteLine("Caught exception {0}", ex);
}
} If that works, we can move to more complex scenarios. I have a bunch of simple hand written tests that I was using when bringing up the exception handling for Unix x64 at the beginning of the CoreCLR porting and I can share those with you. To enable hardware exception handling, we need to implement the ThrowExceptionFromContextInternal. None of the tests that test null reference exceptions, division by zero or similar would pass without it. I will write the implementation for you. |
@janvorli Thx! I'll give it a try and get back to you. I've opened dotnet/coreclr#9370 to track the exception handling progress. |
Opening an issue to track the efforts to bring up coreclr/corefx on Alpine Linux ARM.
We've re-scoped this issue to track getting "Hello World!" to run on 64-bit Linux. As a next step, we'll work to get exception handling to work on Linux - see dotnet/coreclr#9370
Related threads:
Current approach is to use a docker container on an ARM device for building coreclr; that one is stored here https://github.com/qmfrederik/dotnet-alpine-arm
Building the native components for Alpine arm64:
./build.sh arm64 skipgenerateversion skipmscorlib
src/pal/tests/palsuite/runpaltests.sh /home/coreclr/bin/obj/Linux.arm64.Debug/
CPLUS_INCLUDE_PATH=/usr/include/c++/6.3.0/aarch64-alpine-linux-musl/:/usr/include/c++/6.3.0/ src/Native/build-native.sh arm64
./build.sh skipnative arm64 debug verbose -rebuild
Putting together a sample app:
/home/corefx/bin/Linux.arm64.Debug/Native/*
and/home/coreapp # cp ../coreclr/bin/Product/Linux.arm64.Debug/*
to the app dir, as well asbin/Product/Linux.arm64.Debug/mscorlib.dll
andbin/Product/Linux.arm64.Debug/System.Private.CoreLib.dll
from the cross-build on Linux x64.Backlog:
getcontext
call (currently stubbed out)Open items:
Compile clang 3.9.1 for Alpine; this version may include some welcome bug fixes
Sources are here: http://releases.llvm.org/download.html#3.9.1 , build process here: http://llvm.org/docs/CMake.html
Build process:
or
and at least patches 2, 3 from http://git.alpinelinux.org/cgit/aports/tree/main/llvm?h=3.5-stable
The text was updated successfully, but these errors were encountered: