Skip to content

Alpine Linux ARM64 bring up #7231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 of 9 tasks
qmfrederik opened this issue Jan 12, 2017 · 243 comments
Closed
6 of 9 tasks

Alpine Linux ARM64 bring up #7231

qmfrederik opened this issue Jan 12, 2017 · 243 comments
Assignees
Labels
area-Infrastructure-coreclr os-linux Linux OS (any supported distro)
Milestone

Comments

@qmfrederik
Copy link
Contributor

Opening an issue to track the efforts to bring up coreclr/corefx on Alpine Linux ARM.

We've re-scoped this issue to track getting "Hello World!" to run on 64-bit Linux. As a next step, we'll work to get exception handling to work on Linux - see dotnet/coreclr#9370

Related threads:

Current approach is to use a docker container on an ARM device for building coreclr; that one is stored here https://github.com/qmfrederik/dotnet-alpine-arm

Building the native components for Alpine arm64:

  • Compile coreclr: ./build.sh arm64 skipgenerateversion skipmscorlib
  • Run PAL tests src/pal/tests/palsuite/runpaltests.sh /home/coreclr/bin/obj/Linux.arm64.Debug/
  • Compile corefx: CPLUS_INCLUDE_PATH=/usr/include/c++/6.3.0/aarch64-alpine-linux-musl/:/usr/include/c++/6.3.0/ src/Native/build-native.sh arm64
  • Cross compile mscorlib on Linux for arm: ./build.sh skipnative arm64 debug verbose -rebuild

Putting together a sample app:

  • Compile and publish a demo stand-alone .NET core app, copy that one over to Alpine ARM
  • Copy /home/corefx/bin/Linux.arm64.Debug/Native/* and /home/coreapp # cp ../coreclr/bin/Product/Linux.arm64.Debug/* to the app dir, as well as bin/Product/Linux.arm64.Debug/mscorlib.dll and bin/Product/Linux.arm64.Debug/System.Private.CoreLib.dll from the cross-build on Linux x64.

Backlog:

  • Fix libunwind getcontext call (currently stubbed out)
  • Hardware exception handling on aarm64
  • Unit tests for the various functions that had to be re-written in assembly?

Open items:
Compile clang 3.9.1 for Alpine; this version may include some welcome bug fixes

Sources are here: http://releases.llvm.org/download.html#3.9.1 , build process here: http://llvm.org/docs/CMake.html
Build process:

mkdir mybuilddir
cd mybuilddir
cmake path/to/llvm/source/root
cmake --build . -- -j 92
cmake --build . --target install

or

cmake -DCMAKE_INSTALL_PREFIX=/usr -P cmake_install.cmake

and at least patches 2, 3 from http://git.alpinelinux.org/cgit/aports/tree/main/llvm?h=3.5-stable

  • the same for clang, lld?
@qmfrederik
Copy link
Contributor Author

qmfrederik commented Jan 12, 2017

Copying relevant comments:

@janvorli

@qmfrederik we already have Alpine on x64 running and we have some ARM32 Linuxes working, like Raspbian. So I think it should not be difficult to bring up Alpine ARM. I'll be happy to help you with that. The bringup of Alpine x64 was my work. To begin, it would be the best to start with cross compilation of the managed assemblies instead of trying to do it the mono way that @cydhaselton was forced to use. Once we have stuff working with the cross compiled managed stuff and locally compiled native code, we would create the RID and dotnet packages.

@janvorli

@qmfrederik it is a little different. The first step is to build the native components of coreclr and corefx on your target Alpine ARM system. So you would just clone the repos and for coreclr run ./build.sh arm skipgenerateversion skipmscorlib, for corefx run src\Native\build-native.sh arm
This would get you the native components for coreclr and corefx.
Once you have this building correctly, we will need to build libuv and stuff from core-setup repo.
Finally, I would recommend getting existing dotnet sdk tarball for ARM32 Linux and just replace the native components in it with the newly built stuff.

@janvorli

@qmfrederik I've forgotten to say that in the last phase, you'll need to make sure that the package you use was built from the same version of sources as the binaries you are sticking in. But let's discuss it once we are there.

@janvorli

We would cross compile in the CI. But we would need to enable our cross compiling to build rootfs for Alpine first. So for the bring up, it is the easiest to start with building the native components on the real hardware.

@qmfrederik

@janvorli So, I found some time to try & build coreclr for Alpine on ARM.

I'm building om aarch64 (so 64-bit version of ARM), and Alpine seems to have all dependencies except for libunwind. I built libunwind from source.

Building coreclr via ./build.sh arm skipgenerateversion skipmscorlib manages to compile a fair part of coreclr, but it fails when linking with libunwind.so because it cannot resolve getcontext.

Here's (part of) the build output:

/usr/bin/ld: ../libcoreclrpal.a(seh.cpp.o)(.debug_info+0x774): R_AARCH64_ABS64 used with TLS symbol _ZZ29PAL_ThrowExceptionFromContextE27threadLocalExceptionStorage
/usr/bin/ld: ../libcoreclrpal.a(seh.cpp.o)(.debug_info+0x82c): R_AARCH64_ABS64 used with TLS symbol _ZL27t_nativeExceptionHolderHead
/usr/bin/../lib/gcc/aarch64-alpine-linux-musl/6.3.0/../../../libunwind.so: undefined reference to `getcontext'
clang-3.8: error: linker command failed with exit code 1 (use -v to see invocation)
I used a Docker container for building, so the entire VM setup & build process is documented here:

https://github.com/qmfrederik/dotnet-alpine-arm/blob/master/Dockerfile

I'm assuming something went wrong when building libunwind. I'll dig around but if you have any pointers, they would be greatly appreciated.

@am11

@qmfrederik, aports has libunwind package in their current release, and they didn't enabled AARCH64 yet:
https://pkgs.alpinelinux.org/packages?name=libunwind
https://github.com/alpinelinux/aports/tree/master/main/libunwind (check the APKBUILD file (build script) and the patches which you may also need to apply for aarch64)

@janvorli

@qmfrederik can you please create a separate issue for the Linux aarch64? This one is already extremely long.
One thing I can see in your message - you need to use ./build.sh arm64 skipgenerateversion skipmscorlib, not arm, arm is arm32.
Also, you will need a bunch of changes that @cydhaselton has made first - adding and fixing various asm helpers, for example. They will need to be pulled out selectively, since he was also making changes that were specific for the Android termux environment, like tmp path location etc.
As for the getcontext, I guess the issue is in the missing

if(PAL_CMAKE_PLATFORM_ARCH_ARM64)
find_library(UNWIND_ARCH NAMES unwind-aarch64)
endif()
in the src/pal/src/CMakeLists.txt around line 290 (I am not sure if the unwind-aarch64 is the right name, just extrapolating from the names for other architectures)

@qmfrederik
Copy link
Contributor Author

@am11 re: libunwind on Alpine:

I replicated the build script as good as I could, including the parameters passed to ./configure and the patches. I applied the gist of 10-disable-tests.patch through a sed statement and force-enable-man.patch does not seem to be relevant.
You can find the build script for libunwind here: https://github.com/qmfrederik/dotnet-alpine-arm/blob/master/Dockerfile#L19

Do you have any idea as to why Alpine didn't enable builds of libunwind for aarch64?

@am11
Copy link
Member

am11 commented Jan 12, 2017

@qmfrederik, perhaps this is because nobody has asked for AARCH64 package. I have only tested building libsass and some other native libraries on Alpine (x86_64) in past. One approach is to send a message on their mailing list or IRC channel of aports to add the package or (like in this case) to support particular architecture. We can also send a PR to aports repo, and our changes will be verified with their dedicated build systems (which runs few times a day to rebuild all packages; and is different/detailed than CI checks per PR).

You can find the build script for libunwind here: https://github.com/qmfrederik/dotnet-alpine-arm/blob/master/Dockerfile#L19

This looks good. 👍
Is it functioning properly and ldd path/to/libunwind.so etc. works?

@qmfrederik
Copy link
Contributor Author

@am11 Nope, it looks like it's libunwind.so which references the getcontext symbol:

/home/coreclr # ldd /usr/lib/libunwind.so
        ldd (0xaaaad8854000)
        libc.musl-aarch64.so.1 => ldd (0xaaaad8854000)
Error relocating /usr/lib/libunwind.so: getcontext: symbol not found

@janvorli
Copy link
Member

I believe the symbol is defined in the libunwind-aarch64.so. On Amd64, we link to libunwind.so, libunwind-x86_64.so and libunwind-generic.so. On aarch 64, the CMakeLists was missing the architecture specific file.

@am11
Copy link
Member

am11 commented Jan 12, 2017

@qmfrederik, can you try making this kind of patch for libunwind: https://github.com/alpinelinux/aports/blob/f82ed56f8dd3e5a8ab11a39e75d7c37922cd3691/community/chromium/no-getcontext.patch Basically ifdef it out for GLIBC. Also you may want to use specific version: http://download.savannah.gnu.org/releases/libunwind/libunwind-1.2-rc1.tar.gz, same one that is used by APKBUILD script.

@qmfrederik
Copy link
Contributor Author

@janvorli You are correct:

/home/libunwind-manual $ ldd /usr/lib/libunwind-aarch64.so
        ldd (0xaaaac1fba000)
        libc.musl-aarch64.so.1 => ldd (0xaaaac1fba000)

It looks like somehow the CoreCLR build managed to reference libunwind.so instead of libunwind-aarch64.so. I'll double-check CMakeLists and let you know.

@janvorli
Copy link
Member

@qmfrederik it needs to reference all of the three. The libunwind-aarch64.so is missing, as I've said before.

@qmfrederik
Copy link
Contributor Author

@janvorli I added the reference to libunwind-aarch64, see this commit

Just to be sure - since you say "it needs to reference all of the three", does this mean that coreclr will also link against libunwind.so and if ldd libunwind.so yields errors, the build will fail?

@janvorli
Copy link
Member

It won't fail, the libunwind-aarch64.so will satisfy the symbol need for the libunwind.so. Actually, the libunwind-generic.so seems to not to be needed, but it doesn't hurt.
Look at what we have on amd64:

ldd bin/Product/Linux.x64.Debug/libcoreclr.so
        linux-vdso.so.1 =>  (0x00007fffb84dd000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f9e34ff7000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f9e34dd9000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f9e34bd0000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9e349cc000)
        libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007f9e347c7000)
        libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f9e345ab000)
        libunwind-x86_64.so.8 => /usr/lib/x86_64-linux-gnu/libunwind-x86_64.so.8 (0x00007f9e3438c000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f9e34088000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f9e33d81000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9e339bc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9e3632e000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f9e3379a000)

@qmfrederik
Copy link
Contributor Author

@janvorli I'm relatively sure the issue is in libunwind itself.

libunwind defines unw_tdep_getcontext. On arm a stub is defined in libunwind-arm.h#L262; for aarch64 it is simply defined as getcontext in libunwind-aarch64.h#L17

I'm assuming getcontext doesn't exist on Alpine so that's why we're getting the linker errors.

Question: I can try to copy the definition of getcontext from arm to aarch64. Do you think that will work?

@janvorli
Copy link
Member

@qmfrederik ok, thanks for the explanation. As for copying the code it won't be as simple as that. The aarch64 uses different registers and instructions. You may get some idea on how to do that in the LLVM libunwind below:
https://github.com/llvm-mirror/libunwind/blob/master/src/UnwindRegistersSave.S#L252

@qmfrederik
Copy link
Contributor Author

qmfrederik commented Jan 12, 2017

@janvorli Thanks for the pointer. Is llvm libunwind supposed to be API-compatible with libunwind? I had a cursory look at the header, and the declarations seem to be identical.

Yes, it seems the file you referenced basically implements unw_tdep_getcontext for aarch64, so it should work.

I have to admit I've taken the easy path for now, and just stubbed the function as { return -1; }. That should indicate a failure to the calling code.

Good news! coreclr now compiles successfully on Alpine aarch64.

Are there any coreclr tests I can run to get a grasp of the stability of this version of coreclr?

@janvorli
Copy link
Member

You can run the PAL tests. After building coreclr (without skiptests argument), run
src/pal/tests/palsuite/runpaltests.sh bin/obj/Linux.arm64.Debug/

@qmfrederik
Copy link
Contributor Author

Thanks, I"ll run the PAL tests and post back the results.

Meanwhile, I also attempted to build the native part of corefx, but some standard C++ includes like limits are not found:

/home/corefx/src/Native/Unix/Common/pal_utilities.h:12:10: fatal error: 'limits' file not found
#include <limits>
         ^
1 error generated.

I can export CPLUS_INCLUDE_PATH=/usr/include/c+\+/6.3.0/ only to encounter the next error:

/usr/include/c++/6.3.0/limits:42:10: fatal error: 'bits/c++config.h' file not found
#include <bits/c++config.h>
         ^
1 error generated.

Which C++ package did you use for Alpine x64?

@janvorli
Copy link
Member

This is a dockerfile I was using:

FROM alpine:latest
RUN apk update
RUN apk add bash gcc clang clang-dev cmake make build-base icu-dev python gettext-dev zlib-dev curl-dev krb5-dev linux-headers git llvm autoconf libtool

# Other sources
RUN apk -X https://dl-cdn.alpinelinux.org/alpine/edge/main add --no-cache libunwind-dev util-linux-dev
RUN apk -X https://dl-cdn.alpinelinux.org/alpine/edge/testing add --no-cache lldb-dev lttng-ust-dev

RUN apk add ca-certificates && update-ca-certificates && apk add openssl

I remember having issue with the limits as well and I think one of these packages have fixed it

@qmfrederik
Copy link
Contributor Author

I tried adding those packages but that didn't work; what seems to work is:

export CPLUS_INCLUDE_PATH=/usr/include/c++/6.3.0/aarch64-alpine-linux-musl/:/usr/include/c++/6.3.0/

@qmfrederik
Copy link
Contributor Author

@janvorli the native corefx components build fine, so that's one more step ahead - both coreclr & corefx are now compiling on Alpine ARM64.

Regarding the coreclr pal tests, they seem to hang. Running src/pal/tests/palsuite/runpaltests.sh /home/coreclr/bin/obj/Linux.arm64.Debug gives the following output:

***** Testing PAL *****

Running PAL tests from /home/coreclr/bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite
The list of PAL tests to run will be read from /home/coreclr/src/pal/tests/palsuite/paltestlist.txt
PAL tests will store their temporary files and output in /tmp/PalTestOutput/default.

Running tests...

.

(note the single dot). I've left it running for about 5 minutes, but no additional dot appeared. Looking at the source, it seems there should be a dot per test, so I'm assuming this is not normal.

The PAL test logs in the /tmp folder are all empty.

@janvorli
Copy link
Member

Hmm, when I run it on my RPI3 with ARM32 Linux, the dots appear with a cadence of about 4 per second, so there is obviously something wrong.
Can you try to run just the first test to see if it hangs?
/home/coreclr/bin/obj/Linux.arm.Debug/src/pal/tests/palsuite/c_runtime/abs/test1/paltest_abs_test1

@qmfrederik
Copy link
Contributor Author

Thx, I'll try it later today. Are there steps I can take to enabke some kind of verbose logging, so we can get an idea of where it is stuck?

@janvorli
Copy link
Member

If you look at https://github.com/dotnet/coreclr/blob/master/src/pal/tests/palsuite/runpaltests.sh, you'd see that we print a dot before executing each test. You can instrument that file e.g. by adding echo "Test done" at line 133 and you'd see if the test has hung or if something after that did.

@qmfrederik
Copy link
Contributor Author

@janvorli I cherry-picked 2f441cfcb9ceb01d877ba4f66b9a8f451c959987 onto my branch, rebuilt CoreCLR and ran the /home/coreclr/bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/abs/test1/paltest_abs_test1 test from the command line.

The result is the same - the process hangs (waited for ~ 1 minute) with no console output.

Is there anything we can do to find out what paltest_abs_test1 is waiting for?

@janvorli
Copy link
Member

@qmfrederik can you run it under lldb? it has full debug info if you built it as debug, so you can run it and when it hangs, break in and see what the call stack looks like.

@qmfrederik
Copy link
Contributor Author

@janvorli

Looks like the hang is in InterlockedCompareExchange / __sync_val_compare_and_swap.

I tried with the sqrt and tan and malloc tests and they all fail at the same place.

/home/coreclr # lldb bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/sqrt/test1/paltest_sqrt_test1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named lldb.embedded_interpreter
(lldb) target create "bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/sqrt/test1/paltest_sqrt_test1"
Current executable set to 'bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/sqrt/test1/paltest_sqrt_test1' (aarch64).
(lldb) run
Process 36942 launched: '/home/coreclr/bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/sqrt/test1/paltest_sqrt_test1' (aarch64)
Process 36942 stopped
* thread dotnet/coreclr#1: tid = 36942, 0x0000aaaaaab2ddec paltest_sqrt_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac09a00, Exchange=1, Comperand=0) + 48 at pal.h:5118, name = 'paltest_sqrt_te', stop reason = signal SIGSTOP
    frame #0: 0x0000aaaaaab2ddec paltest_sqrt_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac09a00, Exchange=1, Comperand=0) + 48 at pal.h:5118
   5115     IN LONG Exchange,
   5116     IN LONG Comperand)
   5117 {
-> 5118     return __sync_val_compare_and_swap(
   5119         Destination, /* The pointer to a variable whose value is to be compared with. */
   5120         Comperand, /* The value to be compared */
   5121         Exchange /* The value to be stored */);
/home/coreclr # lldb bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/tan/test1/paltest_tan_test1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named lldb.embedded_interpreter
(lldb) target create "bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/tan/test1/paltest_tan_test1"
Current executable set to 'bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/tan/test1/paltest_tan_test1' (aarch64).
(lldb) run
Process 36952 launched: '/home/coreclr/bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/tan/test1/paltest_tan_test1' (aarch64)
Process 36952 stopped
* thread dotnet/coreclr#1: tid = 36952, 0x0000aaaaaab2de70 paltest_tan_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac09a00, Exchange=1, Comperand=0) + 48 at pal.h:5118, name = 'paltest_tan_tes', stop reason = signal SIGSTOP
    frame #0: 0x0000aaaaaab2de70 paltest_tan_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac09a00, Exchange=1, Comperand=0) + 48 at pal.h:5118
   5115     IN LONG Exchange,
   5116     IN LONG Comperand)
   5117 {
-> 5118     return __sync_val_compare_and_swap(
   5119         Destination, /* The pointer to a variable whose value is to be compared with. */
   5120         Comperand, /* The value to be compared */
   5121         Exchange /* The value to be stored */);

@janvorli
Copy link
Member

What is the call stack at that point (bt command)? This interlocked operation is likely part of some spinlock.

@qmfrederik
Copy link
Contributor Author

* thread dotnet/coreclr#1: tid = 36962, 0x0000aaaaaab2b4c8 paltest_malloc_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac06a00, Exchange=1, Comperand=0) + 48 at pal.h:5118, name = 'paltest_malloc_', stop reason = signal SIGSTOP
  * frame #0: 0x0000aaaaaab2b4c8 paltest_malloc_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac06a00, Exchange=1, Comperand=0) + 48 at pal.h:5118
    frame dotnet/coreclr#1: 0x0000aaaaaab2e0f0 paltest_malloc_test1`CorUnix::CriticalSectionSubSysInitialize() + 64 at cs.cpp:568
    frame dotnet/coreclr#2: 0x0000aaaaaaae9450 paltest_malloc_test1`Initialize(argc=1, argv=0x0000fffffffffce8, flags=5) + 244 at pal.cpp:213
    frame dotnet/coreclr#3: 0x0000aaaaaaae9350 paltest_malloc_test1`::PAL_Initialize(argc=1, argv=0x0000fffffffffce8) + 36 at pal.cpp:149
    frame dotnet/coreclr#4: 0x0000aaaaaaad3304 paltest_malloc_test1`main(argc=1, argv=0x0000fffffffffce8) + 36 at test1.cpp:22
    frame dotnet/coreclr#5: 0x0000ffffb7f79e60 ld-musl-aarch64.so.1`__libc_start_main + 68
    frame dotnet/coreclr#6: 0x0000aaaaaaad2dd8 paltest_malloc_test1`_start_c(p=<unavailable>) + 48 at crt1.c:17
    frame dotnet/coreclr#7: 0x0000ffffb80007e0 ld-musl-aarch64.so.1

@janvorli
Copy link
Member

This is really strange. There is no loop at that place, so it looks like it has hung in the call to __sync_val_compare_and_swap. Can you try to step in the debugger after you break in to see if it is the case?

@qmfrederik
Copy link
Contributor Author

@janvorli got sidetracked because I noticed that call was within an #if _DEBUG statement, so I ran the pal tests in release mode:

The following test(s) failed:
exception_handling/pal_sxs/test1/paltest_pal_sxs_test1. Exit code: 139
filemapping_memmgt/ProbeMemory/ProbeMemory_neg1/paltest_probememory_probememory_neg1. Exit code: 134

PAL Test Results:
  Passed: 788
  Failed: 2

Thought that was worth sharing, I'll now step in the debugger for the tests compiled in debug mode.

@qmfrederik
Copy link
Contributor Author

@janvorli so here's the lldb output. lldb does not return when I run the step into (s) command, so I guess that confirms this is where we're stuck?

/home/coreclr # lldb bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/malloc/test1/paltest_malloc_test1
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named lldb.embedded_interpreter
(lldb) target create "bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/malloc/test1/paltest_malloc_test1"
Current executable set to 'bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/malloc/test1/paltest_malloc_test1' (aarch64).
(lldb) run
Process 79843 launched: '/home/coreclr/bin/obj/Linux.arm64.Debug/src/pal/tests/palsuite/c_runtime/malloc/test1/paltest_malloc_test1' (aarch64)
Process 79843 stopped
* thread dotnet/coreclr#1: tid = 79843, 0x0000aaaaaab2b4c8 paltest_malloc_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac06a00, Exchange=1, Comperand=0) + 48 at pal.h:5118, name = 'paltest_malloc_', stop reason = signal SIGSTOP
    frame #0: 0x0000aaaaaab2b4c8 paltest_malloc_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac06a00, Exchange=1, Comperand=0) + 48 at pal.h:5118
   5115     IN LONG Exchange,
   5116     IN LONG Comperand)
   5117 {
-> 5118     return __sync_val_compare_and_swap(
   5119         Destination, /* The pointer to a variable whose value is to be compared with. */
   5120         Comperand, /* The value to be compared */
   5121         Exchange /* The value to be stored */);
(lldb) bt
* thread dotnet/coreclr#1: tid = 79843, 0x0000aaaaaab2b4c8 paltest_malloc_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac06a00, Exchange=1, Comperand=0) + 48 at pal.h:5118, name = 'paltest_malloc_', stop reason = signal SIGSTOP
  * frame #0: 0x0000aaaaaab2b4c8 paltest_malloc_test1`::InterlockedCompareExchange(Destination=0x0000aaaaaac06a00, Exchange=1, Comperand=0) + 48 at pal.h:5118
    frame dotnet/coreclr#1: 0x0000aaaaaab2e0f0 paltest_malloc_test1`CorUnix::CriticalSectionSubSysInitialize() + 64 at cs.cpp:568
    frame dotnet/coreclr#2: 0x0000aaaaaaae9450 paltest_malloc_test1`Initialize(argc=1, argv=0x0000fffffffffce8, flags=5) + 244 at pal.cpp:213
    frame dotnet/coreclr#3: 0x0000aaaaaaae9350 paltest_malloc_test1`::PAL_Initialize(argc=1, argv=0x0000fffffffffce8) + 36 at pal.cpp:149
    frame dotnet/coreclr#4: 0x0000aaaaaaad3304 paltest_malloc_test1`main(argc=1, argv=0x0000fffffffffce8) + 36 at test1.cpp:22
    frame dotnet/coreclr#5: 0x0000ffffb7f79e60 ld-musl-aarch64.so.1`__libc_start_main + 68
    frame dotnet/coreclr#6: 0x0000aaaaaaad2dd8 paltest_malloc_test1`_start_c(p=<unavailable>) + 48 at crt1.c:17
    frame dotnet/coreclr#7: 0x0000ffffb80007e0 ld-musl-aarch64.so.1
(lldb) s

@janvorli
Copy link
Member

@qmfrederik I guess that's the case then. It is pretty strange. Could you please do the following two things:

  1. Disass the function at frame 0
  2. dump memory at the Destination

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

@qmfrederik Ok, let's set managed breakpoint on the ParamsArray..cctor again and then do some stepping. For the beginning, let's just step over the calls (using then command) and see what are the return values (in the x0 register after each call) of all the calls. The call dotnet/coreclr#1, 4 and 7should be allocations of the arrays, the dotnet/coreclr#2, 5 and 8 should get the statics base and the return of dotnet/coreclr#3, 6 and 9 are not interesting.

@qmfrederik
Copy link
Contributor Author

qmfrederik commented Feb 1, 2017

From the "there has to be a better way to do this" departement, here's the output:

[removed]

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

@qmfrederik I don't know why the lldb stepped into the calls instead of over them. I didn't expect to get the full trace, just the return values of the functions called. Which you have not printed.
Let's try a better way now. After you break into the ..cctor, set a regular breakpoint at every third call in the cctor (was bl 0xffff3d45d6b0 in your disassembly above) and let the execution continue. When you hit each of the breakpoints, please print the value of x14 and x15 p/x $x14 and p/x $x15. The x14 is where we store the pointer to the allocated object array, the x15 is the pointer.
Now after recording the pointers (unless the x15 is null, which would mean there is a problem with the allocation), let's continue until we hit our failure. Then print the System.ParamsArray.

@qmfrederik
Copy link
Contributor Author

@janvorli Here you go: x15 is not null; but oneArgArray, twoArgArray and threeArgArray keep the NotInit label:

root@dotnetarm:/home/lldb/build/bin# ./lldb /home/coreapp/corerun /home/coreapp/coreapp.dll 
(lldb) target create "/home/coreapp/corerun"
Current executable set to '/home/coreapp/corerun' (aarch64).
(lldb) settings set -- target.run-args  "/home/coreapp/coreapp.dll"
(lldb) breakpoint set -n coreclr_execute_assembly
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) run
Process 71080 launched: '/home/coreapp/corerun' (aarch64)
1 location added to breakpoint 1
Process 71080 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 1.1
    frame #0: 0x0000ffffb6ef8bec libcoreclr.so`::coreclr_execute_assembly(hostHandle=0x000000000043a470, domainId=1, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x000000000042cc50, exitCode=0x0000fffffffff104) at unixinterface.cpp:363
   360 	            const char* managedAssemblyPath,
   361 	            unsigned int* exitCode)
   362 	{
-> 363 	    if (exitCode == NULL)
   364 	    {
   365 	        return HRESULT_FROM_WIN32(ERROR_INVALID_PARAMETER);
   366 	    }
(lldb) plugin load /home/coreapp/libsosplugin.so
(lldb) bpmd System.Private.CoreLib.dll ParamsArray..cctor
Adding pending breakpoints...
(lldb) c  
Process 71080 resuming
JITTED System.Private.CoreLib!System.ParamsArray..cctor()
Setting breakpoint: breakpoint set --address 0x0000FFFF3D62FDC4 [System.ParamsArray..cctor()]
Process 71080 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 3.1
    frame #0: 0x0000ffff3d62fdc4
->  0xffff3d62fdc4: mov    x0, #0x8d8a
    0xffff3d62fdc8: movk   x0, #0x3d4a, lsl dotnet/coreclr#16
    0xffff3d62fdcc: movk   x0, #0xffff, lsl dotnet/coreclr#32
    0xffff3d62fdd0: mov    w1, #0x1
(lldb) clru 0x0000FFFF3D62FDC4
Normal JIT generated code
System.ParamsArray..cctor()
Begin 0000FFFF3D62FDB0, size c8
0000ffff3d62fdb0 fd7bbda9             stp     x29, x30, [sp, #-0x30]!
0000ffff3d62fdb4 fd030091             mov     x29, sp
0000ffff3d62fdb8 a0630091             add     x0, x29, #0x18
0000ffff3d62fdbc 1f7c81a8             stp     xzr, xzr, [x0], #0x10
0000ffff3d62fdc0 1f0000f9             str     xzr, [x0]
>>> 0000ffff3d62fdc4 40b191d2             mov     x0, #0x8d8a
0000ffff3d62fdc8 40a9a7f2             movk    x0, #0x3d4a, lsl dotnet/coreclr#16
0000ffff3d62fdcc e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d62fdd0 21008052             mov     w1, #0x1
0000ffff3d62fdd4 4beef897             bl      0xffff3d46b700
0000ffff3d62fdd8 a01700f9             str     x0, [x29, #0x28]
0000ffff3d62fddc 000586d2             mov     x0, #0x3028
0000ffff3d62fde0 a0a4a7f2             movk    x0, #0x3d25, lsl dotnet/coreclr#16
0000ffff3d62fde4 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d62fde8 61518052             mov     w1, #0x28b
0000ffff3d62fdec 91eef897             bl      0xffff3d46b830
0000ffff3d62fdf0 0e602a91             add     x14, x0, #0xa98
0000ffff3d62fdf4 af1740f9             ldr     x15, [x29, #0x28]
0000ffff3d62fdf8 2eeef897             bl      0xffff3d46b6b0
0000ffff3d62fdfc 40b191d2             mov     x0, #0x8d8a
0000ffff3d62fe00 40a9a7f2             movk    x0, #0x3d4a, lsl dotnet/coreclr#16
0000ffff3d62fe04 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d62fe08 41008052             mov     w1, #0x2
0000ffff3d62fe0c 3deef897             bl      0xffff3d46b700
0000ffff3d62fe10 a01300f9             str     x0, [x29, #0x20]
0000ffff3d62fe14 000586d2             mov     x0, #0x3028
0000ffff3d62fe18 a0a4a7f2             movk    x0, #0x3d25, lsl dotnet/coreclr#16
0000ffff3d62fe1c e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d62fe20 61518052             mov     w1, #0x28b
0000ffff3d62fe24 83eef897             bl      0xffff3d46b830
0000ffff3d62fe28 0e802a91             add     x14, x0, #0xaa0
0000ffff3d62fe2c af1340f9             ldr     x15, [x29, #0x20]
0000ffff3d62fe30 20eef897             bl      0xffff3d46b6b0
0000ffff3d62fe34 40b191d2             mov     x0, #0x8d8a
0000ffff3d62fe38 40a9a7f2             movk    x0, #0x3d4a, lsl dotnet/coreclr#16
0000ffff3d62fe3c e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d62fe40 61008052             mov     w1, #0x3
0000ffff3d62fe44 2feef897             bl      0xffff3d46b700
0000ffff3d62fe48 a00f00f9             str     x0, [x29, #0x18]
0000ffff3d62fe4c 000586d2             mov     x0, #0x3028
0000ffff3d62fe50 a0a4a7f2             movk    x0, #0x3d25, lsl dotnet/coreclr#16
0000ffff3d62fe54 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d62fe58 61518052             mov     w1, #0x28b
0000ffff3d62fe5c 75eef897             bl      0xffff3d46b830
0000ffff3d62fe60 0ea02a91             add     x14, x0, #0xaa8
0000ffff3d62fe64 af0f40f9             ldr     x15, [x29, #0x18]
0000ffff3d62fe68 12eef897             bl      0xffff3d46b6b0
0000ffff3d62fe6c 1f2003d5             nop     
0000ffff3d62fe70 fd7bc3a8             ldp     x29, x30, [sp], #0x30
0000ffff3d62fe74 c0035fd6             ret     
(lldb) breakpoint set -a 0xffff3d46b6b0
Breakpoint 4: address = 0x0000ffff3d46b6b0
(lldb) c
Process 71080 resuming
Process 71080 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 4.1
    frame #0: 0x0000ffff3d46b6b0
->  0xffff3d46b6b0: ldr    x16, #0x8
    0xffff3d46b6b4: br     x16
    0xffff3d46b6b8: tbnz   x16, #0x24, 0xffff3d466450
    0xffff3d46b6bc: .long  0x0000ffff                ; unknown opcode
(lldb) p/x $x14
(unsigned long) $0 = 0x0000ffff27fffae0
(lldb) p/x $x15
(unsigned long) $1 = 0x0000ffff180132d8
(lldb) c
Process 71080 resuming
Process 71080 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 4.1
    frame #0: 0x0000ffff3d46b6b0
->  0xffff3d46b6b0: ldr    x16, #0x8
    0xffff3d46b6b4: br     x16
    0xffff3d46b6b8: tbnz   x16, #0x24, 0xffff3d466450
    0xffff3d46b6bc: .long  0x0000ffff                ; unknown opcode
(lldb) p/x $x14
(unsigned long) $2 = 0x0000ffff27fffae8
(lldb) p/x $x15
(unsigned long) $3 = 0x0000ffff180132f8
(lldb) c
Process 71080 resuming
Process 71080 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 4.1
    frame #0: 0x0000ffff3d46b6b0
->  0xffff3d46b6b0: ldr    x16, #0x8
    0xffff3d46b6b4: br     x16
    0xffff3d46b6b8: tbnz   x16, #0x24, 0xffff3d466450
    0xffff3d46b6bc: .long  0x0000ffff                ; unknown opcode
(lldb) p/x $x14
(unsigned long) $4 = 0x0000ffff27fffaf0
(lldb) p/x $x15
(unsigned long) $5 = 0x0000ffff18013320
(lldb) c
Process 71080 resuming
Process 71080 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSEGV: invalid address (fault address: 0x8)
    frame #0: 0x0000ffff3d62ffec
->  0xffff3d62ffec: ldrsw  x0, [x0, #0x8]
    0xffff3d62fff0: str    w0, [x29, #0x1c]
    0xffff3d62fff4: nop    
    0xffff3d62fff8: b      0xffff3d62fffc
(lldb) x/gx $x29+0x10
0xffffffffc2e0: 0x0000ffffffffc3f0
(lldb) name2ee *!System.ParamsArray
Module:      0000ffff3d246020
Assembly:    System.Private.CoreLib.dll
Token:       000000000200028C
MethodTable: 0000ffff3d753538
EEClass:     0000ffff3d74d1a0
Name:        System.ParamsArray
--------------------------------------
Module:      0000ffff3d253d20
Assembly:    coreapp.dll
--------------------------------------
Module:      0000ffff3d2549f8
Assembly:    System.Runtime.dll
--------------------------------------
Module:      0000ffff3d2558c8
Assembly:    System.Console.dll
--------------------------------------
Module:      0000ffff3d2583e8
Assembly:    System.Runtime.Extensions.dll
--------------------------------------
Module:      0000ffff3d25a800
Assembly:    System.Threading.dll
--------------------------------------
Module:      0000ffff3d6e1f50
Assembly:    System.Diagnostics.Debug.dll
--------------------------------------
Module:      0000ffff3d6e8508
Assembly:    System.Text.Encoding.Extensions.dll
(lldb) sos DumpVC 0000ffff3d753538 0x0000ffffffffc3f0
Name:        System.ParamsArray
MethodTable: 0000ffff3d753538
EEClass:     0000ffff3d74d1a0
Size:        48(0x30) bytes
File:        /home/coreapp/System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000ffff3d423910  4000ccf        0        System.Object  0 instance 0000ffff18013288 arg0
0000ffff3d423910  4000cd0        8        System.Object  0 instance 0000ffff180132a8 arg1
0000ffff3d423910  4000cd1       10        System.Object  0 instance 0000ffff180132c0 arg2
0000ffff3d42fb38  4000cd2       18      System.Object[]  0 instance 0000000000000000 args
0000ffff3d42fb38  4000ccc      a98      System.Object[]  0   shared           static oneArgArray
                                 >> Domain:Value  0000000000457980:NotInit  <<
0000ffff3d42fb38  4000ccd      aa0      System.Object[]  0   shared           static twoArgArray
                                 >> Domain:Value  0000000000457980:NotInit  <<
0000ffff3d42fb38  4000cce      aa8      System.Object[]  0   shared           static threeArgArray
                                 >> Domain:Value  0000000000457980:NotInit  <<
(lldb) 

@qmfrederik
Copy link
Contributor Author

By the way, I was inspecting the values of arg0, arg1, arg2, and it appears we have the same NotInit behavior for string.Empty

(lldb) dumpobj 0000ffff18013288
Name:        System.String
MethodTable: 0000ffff3d4e71b0
EEClass:     0000ffff3d4f5370
Size:        28(0x1c) bytes
File:        /home/coreapp/System.Private.CoreLib.dll
String:      

Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000ffff3d4b91d0  4000a07        8         System.Int32  1 instance                1 m_stringLength
0000ffff3d4b3178  4000a08        c          System.Char  1 instance                a m_firstChar
0000ffff3d4e71b0  4000a09      950        System.String  0   shared           static Empty
                                 >> Domain:Value  0000000000457980:NotInit  <<

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

@qmfrederik the addresses that are in x14 should match the addresses where the oneArgArray, twoArgArray and threeArgArray are located.
Looking at the x14 values you have printed, they were 0x0000ffff27fffae0, 0x0000ffff27fffae8 and 0x0000ffff27fffaf0. The third one should be the address that the regular constructor accesses when setting its args member to the threeArgArray.

Could you please also set a breakpoint to the regular constructor of the ParamsArray (bpmd System.Private.CoreLib.dll ParamsArray..ctor) and use clru to disass it once you hit it? The code to access the static threeArgArray should use the same way as the static constructor to calculate its address. That means a call with w1 set to 0x28b and then adding offset 0xaa8 to x0 to get the actual address.

@qmfrederik
Copy link
Contributor Author

@janvorli Here's the clru for the regular constructor. The call seems to match your description:

0000ffff3d621d74 61518052             mov     w1, #0x28b
0000ffff3d621d78 aeeef897             bl      0xffff3d45d830
0000ffff3d621d7c 0f5445f9             ldr     x15, [x0, #0xaa8]

The regular constructor was hit before the static constructor, not sure if that's expected.

root@dotnetarm:/home/lldb/build/bin# ./lldb /home/coreapp/corerun /home/coreapp/coreapp.dll 
(lldb) target create "/home/coreapp/corerun"
Current executable set to '/home/coreapp/corerun' (aarch64).
(lldb) settings set -- target.run-args  "/home/coreapp/coreapp.dll"
(lldb) breakpoint set -n coreclr_execute_assembly
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) run
Process 82817 launched: '/home/coreapp/corerun' (aarch64)
1 location added to breakpoint 1
Process 82817 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 1.1
    frame #0: 0x0000ffffb6ef8bec libcoreclr.so`::coreclr_execute_assembly(hostHandle=0x000000000043a470, domainId=1, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x000000000042cc50, exitCode=0x0000fffffffff104) at unixinterface.cpp:363
   360 	            const char* managedAssemblyPath,
   361 	            unsigned int* exitCode)
   362 	{
-> 363 	    if (exitCode == NULL)
   364 	    {
   365 	        return HRESULT_FROM_WIN32(ERROR_INVALID_PARAMETER);
   366 	    }
(lldb) plugin load /home/coreapp/libsosplugin.so
(lldb) bpmd System.Private.CoreLib.dll ParamsArray..cctor
Adding pending breakpoints...
(lldb) bpmd System.Private.CoreLib.dll ParamsArray..ctor         
Adding pending breakpoints...
(lldb) c
Process 82817 resuming
JITTED System.Private.CoreLib!System.ParamsArray..ctor(System.Object, System.Object, System.Object)
Setting breakpoint: breakpoint set --address 0x0000FFFF3D621D38 [System.ParamsArray..ctor(System.Object, System.Object, System.Object)]
Process 82817 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 3.1
    frame #0: 0x0000ffff3d621d38
->  0xffff3d621d38: nop    
    0xffff3d621d3c: ldr    x14, [x29, #0x28]
    0xffff3d621d40: ldr    x15, [x29, #0x20]
    0xffff3d621d44: bl     0xffff3d46cf38
(lldb) clru 0x0000FFFF3D621D38
Normal JIT generated code
System.ParamsArray..ctor(System.Object, System.Object, System.Object)
Begin 0000FFFF3D621D20, size 78
0000ffff3d621d20 fd7bbda9             stp     x29, x30, [sp, #-0x30]!
0000ffff3d621d24 fd030091             mov     x29, sp
0000ffff3d621d28 a01700f9             str     x0, [x29, #0x28]
0000ffff3d621d2c a11300f9             str     x1, [x29, #0x20]
0000ffff3d621d30 a20f00f9             str     x2, [x29, #0x18]
0000ffff3d621d34 a30b00f9             str     x3, [x29, #0x10]
>>> 0000ffff3d621d38 1f2003d5             nop     
0000ffff3d621d3c ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d621d40 af1340f9             ldr     x15, [x29, #0x20]
0000ffff3d621d44 7d2cf997             bl      0xffff3d46cf38
0000ffff3d621d48 ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d621d4c ce210091             add     x14, x14, #0x8
0000ffff3d621d50 af0f40f9             ldr     x15, [x29, #0x18]
0000ffff3d621d54 792cf997             bl      0xffff3d46cf38
0000ffff3d621d58 ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d621d5c ce410091             add     x14, x14, #0x10
0000ffff3d621d60 af0b40f9             ldr     x15, [x29, #0x10]
0000ffff3d621d64 752cf997             bl      0xffff3d46cf38
0000ffff3d621d68 00058ad2             mov     x0, #0x5028
0000ffff3d621d6c 80a4a7f2             movk    x0, #0x3d24, lsl dotnet/coreclr#16
0000ffff3d621d70 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d621d74 61518052             mov     w1, #0x28b
0000ffff3d621d78 aeeef897             bl      0xffff3d45d830
0000ffff3d621d7c 0f5445f9             ldr     x15, [x0, #0xaa8]
0000ffff3d621d80 ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d621d84 ce610091             add     x14, x14, #0x18
0000ffff3d621d88 6c2cf997             bl      0xffff3d46cf38
0000ffff3d621d8c 1f2003d5             nop     
0000ffff3d621d90 fd7bc3a8             ldp     x29, x30, [sp], #0x30
0000ffff3d621d94 c0035fd6             ret     
(lldb) c
Process 82817 resuming
JITTED System.Private.CoreLib!System.ParamsArray..cctor()
Setting breakpoint: breakpoint set --address 0x0000FFFF3D621DC4 [System.ParamsArray..cctor()]
Process 82817 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 4.1
    frame #0: 0x0000ffff3d621dc4
->  0xffff3d621dc4: mov    x0, #0xad8a
    0xffff3d621dc8: movk   x0, #0x3d49, lsl dotnet/coreclr#16
    0xffff3d621dcc: movk   x0, #0xffff, lsl dotnet/coreclr#32
    0xffff3d621dd0: mov    w1, #0x1
(lldb) 

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

@qmfrederik it is expected that the regular constructor is ran before the static one. The static one is actually called from inside of the call to 0xffff3d45d830. That function is the JIT_GetSharedGCStaticBase_Portable and its goal is to return the the base address of the statics and ensure that the static constructor for a class described by the parameter in w1 (0x28b in our case) was called.
The ldr x15, [x0, #0xaa8] in the regular constructor at 0000ffff3d621d7c above should load the value that the static constructor stored at the offset 0xaa8 from the statics base.
So it is pretty weird that it is returning null.
The only thing I can think of to try would be to run the app with managed breakpoints set to both the static and regular constructors. When you hit the regular constructor, set regular breakpoint at the ldr x15, [x0, #0xaa8] and continue. When you hit the static constructor, set regular breakpoint at add x14, x0, #0xaa8 and continue.
Once you hit the regular breakpoint at add x14, x0, #0xaa8, print the x0 p/x $x0. Then continue. Once you hit the other regular breakpoint, print the x0 again and also dump x/gx $x0+0xaa8

If the values of x0 don't match, that's the problem and we need to figure out why. If they match, and the value dumped from the memory is null, then there is some problem with the JIT_WriteBarrier or something just overwrites the value after the JIT_WriteBarrier has written it there in the static constructor.

@qmfrederik
Copy link
Contributor Author

@janvorli The values of x0 seem to match, but the value dumped from memory for the statics is NotInit:

root@dotnetarm:/home/lldb/build/bin# ./lldb /home/coreapp/corerun /home/coreapp/coreapp.dll 
(lldb) target create "/home/coreapp/corerun"
Current executable set to '/home/coreapp/corerun' (aarch64).
(lldb) settings set -- target.run-args  "/home/coreapp/coreapp.dll"
(lldb) breakpoint set coreclr_execute_assembly
error: invalid combination of options for the given command
(lldb) breakpo^C
error: invalid combination of options for the given command
(lldb) exit
root@dotnetarm:/home/lldb/build/bin# ./lldb /home/coreapp/corerun /home/coreapp/coreapp.dll 
(lldb) target create "/home/coreapp/corerun"
Current executable set to '/home/coreapp/corerun' (aarch64).
(lldb) settings set -- target.run-args  "/home/coreapp/coreapp.dll"
(lldb) breakpoint set -n coreclr_execute_assembly
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) run
Process 86760 launched: '/home/coreapp/corerun' (aarch64)
1 location added to breakpoint 1
Process 86760 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 1.1
    frame #0: 0x0000ffffb6ef8bec libcoreclr.so`::coreclr_execute_assembly(hostHandle=0x000000000043a470, domainId=1, argc=0, argv=0x0000000000000000, managedAssemblyPath=0x000000000042cc50, exitCode=0x0000fffffffff104) at unixinterface.cpp:363
   360 	            const char* managedAssemblyPath,
   361 	            unsigned int* exitCode)
   362 	{
-> 363 	    if (exitCode == NULL)
   364 	    {
   365 	        return HRESULT_FROM_WIN32(ERROR_INVALID_PARAMETER);
   366 	    }
(lldb) plugin load /home/coreapp/libsosplugin.so
(lldb) bpmd System.Private.CoreLib.dll ParamsArray..cctor
Adding pending breakpoints...
(lldb) bpmd System.Private.CoreLib.dll ParamsArray..ctor
Adding pending breakpoints...
(lldb) c
Process 86760 resuming
JITTED System.Private.CoreLib!System.ParamsArray..ctor(System.Object, System.Object, System.Object)
Setting breakpoint: breakpoint set --address 0x0000FFFF3D606D38 [System.ParamsArray..ctor(System.Object, System.Object, System.Object)]
Process 86760 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 3.1
    frame #0: 0x0000ffff3d606d38
->  0xffff3d606d38: nop    
    0xffff3d606d3c: ldr    x14, [x29, #0x28]
    0xffff3d606d40: ldr    x15, [x29, #0x20]
    0xffff3d606d44: bl     0xffff3d451f38
(lldb) clru
(lldb) clru 0x0000FFFF3D606D38
Normal JIT generated code
System.ParamsArray..ctor(System.Object, System.Object, System.Object)
Begin 0000FFFF3D606D20, size 78
0000ffff3d606d20 fd7bbda9             stp     x29, x30, [sp, #-0x30]!
0000ffff3d606d24 fd030091             mov     x29, sp
0000ffff3d606d28 a01700f9             str     x0, [x29, #0x28]
0000ffff3d606d2c a11300f9             str     x1, [x29, #0x20]
0000ffff3d606d30 a20f00f9             str     x2, [x29, #0x18]
0000ffff3d606d34 a30b00f9             str     x3, [x29, #0x10]
>>> 0000ffff3d606d38 1f2003d5             nop     
0000ffff3d606d3c ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d606d40 af1340f9             ldr     x15, [x29, #0x20]
0000ffff3d606d44 7d2cf997             bl      0xffff3d451f38
0000ffff3d606d48 ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d606d4c ce210091             add     x14, x14, #0x8
0000ffff3d606d50 af0f40f9             ldr     x15, [x29, #0x18]
0000ffff3d606d54 792cf997             bl      0xffff3d451f38
0000ffff3d606d58 ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d606d5c ce410091             add     x14, x14, #0x10
0000ffff3d606d60 af0b40f9             ldr     x15, [x29, #0x10]
0000ffff3d606d64 752cf997             bl      0xffff3d451f38
0000ffff3d606d68 000594d2             mov     x0, #0xa028
0000ffff3d606d6c 40a4a7f2             movk    x0, #0x3d22, lsl dotnet/coreclr#16
0000ffff3d606d70 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d606d74 61518052             mov     w1, #0x28b
0000ffff3d606d78 aeeef897             bl      0xffff3d442830
0000ffff3d606d7c 0f5445f9             ldr     x15, [x0, #0xaa8]
0000ffff3d606d80 ae1740f9             ldr     x14, [x29, #0x28]
0000ffff3d606d84 ce610091             add     x14, x14, #0x18
0000ffff3d606d88 6c2cf997             bl      0xffff3d451f38
0000ffff3d606d8c 1f2003d5             nop     
0000ffff3d606d90 fd7bc3a8             ldp     x29, x30, [sp], #0x30
0000ffff3d606d94 c0035fd6             ret     
(lldb) breakpoint set -a 0000ffff3d606d7c
Breakpoint 4: address = 0x0000ffff3d606d7c
(lldb) c
Process 86760 resuming
JITTED System.Private.CoreLib!System.ParamsArray..cctor()
Setting breakpoint: breakpoint set --address 0x0000FFFF3D606DC4 [System.ParamsArray..cctor()]
Process 86760 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 5.1
    frame #0: 0x0000ffff3d606dc4
->  0xffff3d606dc4: mov    x0, #0xfd8a
    0xffff3d606dc8: movk   x0, #0x3d47, lsl dotnet/coreclr#16
    0xffff3d606dcc: movk   x0, #0xffff, lsl dotnet/coreclr#32
    0xffff3d606dd0: mov    w1, #0x1
(lldb) clru 0x0000ffff3d606dc4
Normal JIT generated code
System.ParamsArray..cctor()
Begin 0000FFFF3D606DB0, size c8
0000ffff3d606db0 fd7bbda9             stp     x29, x30, [sp, #-0x30]!
0000ffff3d606db4 fd030091             mov     x29, sp
0000ffff3d606db8 a0630091             add     x0, x29, #0x18
0000ffff3d606dbc 1f7c81a8             stp     xzr, xzr, [x0], #0x10
0000ffff3d606dc0 1f0000f9             str     xzr, [x0]
>>> 0000ffff3d606dc4 40b19fd2             mov     x0, #0xfd8a
0000ffff3d606dc8 e0a8a7f2             movk    x0, #0x3d47, lsl dotnet/coreclr#16
0000ffff3d606dcc e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d606dd0 21008052             mov     w1, #0x1
0000ffff3d606dd4 4beef897             bl      0xffff3d442700
0000ffff3d606dd8 a01700f9             str     x0, [x29, #0x28]
0000ffff3d606ddc 000594d2             mov     x0, #0xa028
0000ffff3d606de0 40a4a7f2             movk    x0, #0x3d22, lsl dotnet/coreclr#16
0000ffff3d606de4 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d606de8 61518052             mov     w1, #0x28b
0000ffff3d606dec 91eef897             bl      0xffff3d442830
0000ffff3d606df0 0e602a91             add     x14, x0, #0xa98
0000ffff3d606df4 af1740f9             ldr     x15, [x29, #0x28]
0000ffff3d606df8 2eeef897             bl      0xffff3d4426b0
0000ffff3d606dfc 40b19fd2             mov     x0, #0xfd8a
0000ffff3d606e00 e0a8a7f2             movk    x0, #0x3d47, lsl dotnet/coreclr#16
0000ffff3d606e04 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d606e08 41008052             mov     w1, #0x2
0000ffff3d606e0c 3deef897             bl      0xffff3d442700
0000ffff3d606e10 a01300f9             str     x0, [x29, #0x20]
0000ffff3d606e14 000594d2             mov     x0, #0xa028
0000ffff3d606e18 40a4a7f2             movk    x0, #0x3d22, lsl dotnet/coreclr#16
0000ffff3d606e1c e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d606e20 61518052             mov     w1, #0x28b
0000ffff3d606e24 83eef897             bl      0xffff3d442830
0000ffff3d606e28 0e802a91             add     x14, x0, #0xaa0
0000ffff3d606e2c af1340f9             ldr     x15, [x29, #0x20]
0000ffff3d606e30 20eef897             bl      0xffff3d4426b0
0000ffff3d606e34 40b19fd2             mov     x0, #0xfd8a
0000ffff3d606e38 e0a8a7f2             movk    x0, #0x3d47, lsl dotnet/coreclr#16
0000ffff3d606e3c e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d606e40 61008052             mov     w1, #0x3
0000ffff3d606e44 2feef897             bl      0xffff3d442700
0000ffff3d606e48 a00f00f9             str     x0, [x29, #0x18]
0000ffff3d606e4c 000594d2             mov     x0, #0xa028
0000ffff3d606e50 40a4a7f2             movk    x0, #0x3d22, lsl dotnet/coreclr#16
0000ffff3d606e54 e0ffdff2             movk    x0, #0xffff, lsl dotnet/coreclr#32
0000ffff3d606e58 61518052             mov     w1, #0x28b
0000ffff3d606e5c 75eef897             bl      0xffff3d442830
0000ffff3d606e60 0ea02a91             add     x14, x0, #0xaa8
0000ffff3d606e64 af0f40f9             ldr     x15, [x29, #0x18]
0000ffff3d606e68 12eef897             bl      0xffff3d4426b0
0000ffff3d606e6c 1f2003d5             nop     
0000ffff3d606e70 fd7bc3a8             ldp     x29, x30, [sp], #0x30
0000ffff3d606e74 c0035fd6             ret     
(lldb) breakpoint set -a 0000ffff3d606e60
Breakpoint 6: address = 0x0000ffff3d606e60
(lldb) c
Process 86760 resuming
Process 86760 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 6.1
    frame #0: 0x0000ffff3d606e60
->  0xffff3d606e60: add    x14, x0, #0xaa8           ; =0xaa8 
    0xffff3d606e64: ldr    x15, [x29, #0x18]
    0xffff3d606e68: bl     0xffff3d4426b0
    0xffff3d606e6c: nop    
(lldb) p/x $x0
(unsigned long) $0 = 0x0000ffff27fff048
(lldb) c
Process 86760 resuming
Process 86760 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = breakpoint 4.1
    frame #0: 0x0000ffff3d606d7c
->  0xffff3d606d7c: ldr    x15, [x0, #0xaa8]
    0xffff3d606d80: ldr    x14, [x29, #0x28]
    0xffff3d606d84: add    x14, x14, #0x18           ; =0x18 
    0xffff3d606d88: bl     0xffff3d451f38
(lldb) p/x $x0
(unsigned long) $1 = 0x0000ffff27fff048
(lldb) x/gx $x0+0xaa8
0xffff27fffaf0: 0x0000ffff18013320
(lldb) name2ee *!System.ParamsArray
Module:      0000ffff3d21d020
Assembly:    System.Private.CoreLib.dll
Token:       000000000200028C
MethodTable: 0000ffff3d72a538
EEClass:     0000ffff3d7241a0
Name:        System.ParamsArray
--------------------------------------
Module:      0000ffff3d22ad20
Assembly:    coreapp.dll
--------------------------------------
Module:      0000ffff3d22b9f8
Assembly:    System.Runtime.dll
--------------------------------------
Module:      0000ffff3d22c8c8
Assembly:    System.Console.dll
--------------------------------------
Module:      0000ffff3d22f3e8
Assembly:    System.Runtime.Extensions.dll
--------------------------------------
Module:      0000ffff3d231800
Assembly:    System.Threading.dll
--------------------------------------
Module:      0000ffff3d6b8f50
Assembly:    System.Diagnostics.Debug.dll
--------------------------------------
Module:      0000ffff3d6bf508
Assembly:    System.Text.Encoding.Extensions.dll
(lldb) sos DumpVC 0000ffff3d72a538 0x0000ffff18013320
Name:        System.ParamsArray
MethodTable: 0000ffff3d72a538
EEClass:     0000ffff3d7241a0
Size:        48(0x30) bytes
File:        /home/coreapp/System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000ffff3d3fa910  4000ccf        0        System.Object  0 instance 0000ffff3d406b38 arg0
0000ffff3d3fa910  4000cd0        8        System.Object  0 instance 0000000000000003 arg1
0000ffff3d3fa910  4000cd1       10        System.Object  0 instance 0000000000000000 arg2
0000ffff3d406b38  4000cd2       18      System.Object[]  0 instance 0000000000000000 args
0000ffff3d406b38  4000ccc      a98      System.Object[]  0   shared           static oneArgArray
                                 >> Domain:Value  0000000000457980:NotInit  <<
0000ffff3d406b38  4000ccd      aa0      System.Object[]  0   shared           static twoArgArray
                                 >> Domain:Value  0000000000457980:NotInit  <<
0000ffff3d406b38  4000cce      aa8      System.Object[]  0   shared           static threeArgArray
                                 >> Domain:Value  0000000000457980:NotInit  <<
(lldb) 

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

That's weird. The value you dumped from the memory at $x0+0xaa8 looks like a reasonable address. If you still have the session open, could you do dumpobj 0x0000ffff18013320?

@qmfrederik
Copy link
Contributor Author

(lldb) dumpobj 0x0000ffff18013320
Name:        System.Object[]
MethodTable: 0000ffff3d406b38
EEClass:     0000ffff3d406a70
Size:        48(0x30) bytes
Array:       Rank 1, Number of elements 3, Type CLASS
Fields:
None

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

@qmfrederik so that's the correct array that should be there. I wonder why the args member is set to null then. The fact that the DumpVC shows the static members as null can be an issue of the SOS and not the reality.

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

Btw, I am going to be OOF from tomorrow till Sunday and I won't have access to the internet.

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

Looking at the .ctor code again, the only way how the args member can end up being null is that the call to 0xffff3d451f38 would fail to write x15 to the memory at x14.
Could you please disass the method at 0xffff3d451f38? It might be just a jump stub at that address, so if it is the case, the easiest way to get to the actual function would probably be to step through the stub using s command.

@qmfrederik
Copy link
Contributor Author

@janvorli I'll try that tomorrow, so you'll have the results by the time you get back.

So after that call, x14 and x15 should have the same values, right? I'll dump them too so we can check.

If there are ither things I can check, let me know.

In that code, is x29 the ParamsArray and 0x18 the offset of args?

Enjoy your time off!

@janvorli
Copy link
Member

janvorli commented Feb 1, 2017

@qmfrederik I guess you meant that the value in memory at address x14 should have the same value as the x15, right? That's true.
x29 is a frame pointer, local variables (also locals introduced by the jit and not present in the C$ source) in that function are addressed using an offset from it. "this" pointer is stored at x29+0x28, as you can see if you look at the beginning of the function where x0 is stored there. The x0 is the first parameter of a function and in case of non-static functions, it is the "this".

@qmfrederik
Copy link
Contributor Author

qmfrederik commented Feb 3, 2017

@janvorli Thanks for the explanation!

Here's the disass of the function which should update the value of x14:

0000ffff3d445f38 50000058             ldr     x16, #0x8
0000ffff3d445f3c 00021fd6             br      x16
0000ffff3d445f40 a03c25b7             tbnz    x0, #0x24, 0xffff3d4406d4
0000ffff3d445f44 ffff0000             .long   0x0000ffff
0000ffff3d445f48 50000058             ldr     x16, #0x8
0000ffff3d445f4c 00021fd6             br      x16
0000ffff3d445f50 2c3d35b7             tbnz    x12, #0x26, 0xffff3d4406f4
0000ffff3d445f54 ffff0000             .long   0x0000ffff
0000ffff3d445f58 50000058             ldr     x16, #0x8
0000ffff3d445f5c 00021fd6             br      x16

where x16 appears to point to JIT_CheckedWriteBarrier

(lldb) p/x $x16
(unsigned long) $0 = 0x0000ffffb7253ca0
(lldb) image lookup -v -n JIT_CheckedWriteBarrier
1 match found in /home/coreapp/libcoreclr.so:
        Address: libcoreclr.so[0x0000000000464ca0] (libcoreclr.so..text + 3729040)
        Summary: libcoreclr.so`JIT_CheckedWriteBarrier
         Module: file = "/home/coreapp/libcoreclr.so", arch = "aarch64"
    CompileUnit: id = {0x013f644e}, file = "/home/coreclr/src/vm/arm64/asmhelpers.S", language = "mipsassem"
      LineEntry: [0x0000ffffb7253ca0-0x0000ffffb7253ca4): /home/coreclr/src/vm/arm64/asmhelpers.S:262
         Symbol: id = {0x0000b42c}, range = [0x0000ffffb7253ca0-0x0000ffffb7253cd0), name="JIT_CheckedWriteBarrier"

Once JIT_CheckedWriteBarrier is entered, x14 has a value of 0x0000ffffffffc410, which exceeds g_highest_address, so the NotInHeap label is executed.

So in the end the following code is called:

str  x15, [x14], 8

which I thought was: "copy the memory at x15 to x14, and increase x14 by 8"

but:

(lldb) s
Process 41665 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = step in
    frame #0: 0x0000ffffb7253cc8 libcoreclr.so`JIT_CheckedWriteBarrier at asmhelpers.S:273
   270 	    blt  C_FUNC(JIT_WriteBarrier)
   271 	
   272 	LOCAL_LABEL(NotInHeap):
-> 273 	    str  x15, [x14], 8
   274 	    ret  lr
   275 	WRITE_BARRIER_END JIT_CheckedWriteBarrier
   276 	
(lldb) p/x $x14
(unsigned long) $1 = 0x0000ffffffffc428
(lldb) p/x $x15
(unsigned long) $2 = 0x0000ffff18013290
(lldb) memory read -s8 -fx -c1 0x0000ffffffffc428
0xffffffffc428: 0x0000000000000000
(lldb) memory read -s8 -fx -c1 0x0000ffff18013290
0xffff18013290: 0x0000ffff3d40ab38
(lldb) s
Process 41665 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = step in
    frame #0: 0x0000ffffb7253ccc libcoreclr.so`JIT_CheckedWriteBarrier at asmhelpers.S:274
   271 	
   272 	LOCAL_LABEL(NotInHeap):
   273 	    str  x15, [x14], 8
-> 274 	    ret  lr
   275 	WRITE_BARRIER_END JIT_CheckedWriteBarrier
   276 	
   277 	// void JIT_WriteBarrier(Object** dst, Object* src)
(lldb) memory read -s8 -fx -c1 0x0000ffffffffc428
0xffffffffc428: 0x0000ffff18013290
(lldb) memory read -s8 -fx -c1 0x0000ffff18013290
0xffff18013290: 0x0000ffff3d40ab38
(lldb) memory read -s8 -fx -c1 0x0000ffffffffc430
0xffffffffc430: 0x0000ffffffffc500
Before After
x14 0x0000000000000000 0x0000ffff18013290
x15 0x0000ffff3d40ab38 0x0000ffff3d40ab38

which is something I currently don't understand.

By looking at the code, though, I noticed changes to the .asm file have been made which are not in the .S file, like this one: dotnet/coreclr@49c2eec

I'm guessing I must have missed something, so waiting for your feedback.

PS: cheat sheet for myself:

root@dotnetarm:/home/lldb/build# cat ~/corerc 
breakpoint set -n coreclr_execute_assembly
run
plugin load /home/coreapp/libsosplugin.so
bpmd System.Private.CoreLib.dll ParamsArray..ctor
c
expr unsigned long $barrier = $pc+0x50
breakpoint set -a $barrier
c

expr unsigned long $args=$x14
p/x $x29; // "frame pointer"
p/x $x14; // "this.args"
memory read -s8 -fx -c1 $x14
p/x $x15; // static threeArgs

expr unsigned long $next = $pc+0x4
breakpoint set -a $next
c

p/x $args
memory read -s8 -fx -c1 $args

c

expr unsigned long $methodTable = 0x0000ffff3d72d538; // this seems to be a constant
sos DumpVC $methodTable $x29+0x10; // print the ParamsArray
memory read -s8 -fx -c1 $x29+0x10+0x18; // print the value of args
/home/lldb/build/bin/lldb /home/coreapp/corerun /home/coreapp/coreapp.dll --source ~/corerc

# Breakpoint at which we can load libsosplugin
breakpoint set -n coreclr_execute_assembly
run

# Load libsosplug & set breakpoint for the constructor of ParamsArray
plugin load /home/coreapp/libsosplugin.so
bpmd System.Private.CoreLib.dll ParamsArray..ctor
c

# disassembly the constructor, and set the breakpoint for the function
# which should update the memory at $x14
clru 0x0000ffff3d60ad38

breakpoint set -a 0000ffff3d60ad88
c

# Confirm x16 is JIT_CheckedWriteBarrier
p/x $x16
image lookup -v -n JIT_CheckedWriteBarrier

# Step into, up to the point where the memory at x14 is set
s


# Read the memory at x14, x15 before 
# str  x15, [x14], 8 is executed; the memory at x14 should be 0
p/x $x14
p/x $x15

memory read -s8 -fx -c1 0x0000ffff180131f8
memory read -s8 -fx -c1 0x0000ffffffffc410

@qmfrederik
Copy link
Contributor Author

Never mind, my bad. The value of x15 is written to the memory at address x14, so that appears to be correct.

@qmfrederik
Copy link
Contributor Author

@janvorli

I think I've been able to dump two copies of the ParamsArray, one on which the initializer runs and one which is in scope for the FormatHelper function (since it's a struct, it is passed by value, right)?

(lldb) sos DumpVC 0000ffff3d724538 0x0000ffffffffc400
Name:        System.ParamsArray
MethodTable: 0000ffff3d724538
EEClass:     0000ffff3d71e1a0
Size:        48(0x30) bytes
File:        /home/coreapp/System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000ffff3d3f4910  4000ccf        0        System.Object  0 instance 0000ffff18013150 arg0
0000ffff3d3f4910  4000cd0        8        System.Object  0 instance 0000ffff18013170 arg1
0000ffff3d3f4910  4000cd1       10        System.Object  0 instance 0000ffff18013188 arg2
0000ffff3d400b38  4000cd2       18      System.Object[]  0 instance 0000ffff180131e8 args
0000ffff3d400b38  4000ccc      a98      System.Object[]  0   shared           static oneArgArray
                                 >> Domain:Value  0000000000457990:NotInit  <<
0000ffff3d400b38  4000ccd      aa0      System.Object[]  0   shared           static twoArgArray
                                 >> Domain:Value  0000000000457990:NotInit  <<
0000ffff3d400b38  4000cce      aa8      System.Object[]  0   shared           static threeArgArray
                                 >> Domain:Value  0000000000457990:NotInit  <<


(lldb) sos DumpVC 0000ffff3d724538 0x0000ffffffffc3e0
Name:        System.ParamsArray
MethodTable: 0000ffff3d724538
EEClass:     0000ffff3d71e1a0
Size:        48(0x30) bytes
File:        /home/coreapp/System.Private.CoreLib.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
0000ffff3d3f4910  4000ccf        0        System.Object  0 instance 0000ffff18013150 arg0
0000ffff3d3f4910  4000cd0        8        System.Object  0 instance 0000ffff18013170 arg1
0000ffff3d3f4910  4000cd1       10        System.Object  0 instance 0000ffff18013188 arg2
0000ffff3d400b38  4000cd2       18      System.Object[]  0 instance 0000000000000000 args
0000ffff3d400b38  4000ccc      a98      System.Object[]  0   shared           static oneArgArray
                                 >> Domain:Value  0000000000457990:NotInit  <<
0000ffff3d400b38  4000ccd      aa0      System.Object[]  0   shared           static twoArgArray
                                 >> Domain:Value  0000000000457990:NotInit  <<
0000ffff3d400b38  4000cce      aa8      System.Object[]  0   shared           static threeArgArray
                                 >> Domain:Value  0000000000457990:NotInit  <<

It looks like arg0, arg1 and arg2 are copied over correctly, but args is not.

@qmfrederik
Copy link
Contributor Author

qmfrederik commented Feb 5, 2017

At > 200 comments, this thread becomes a bit long to parse so here's a write-up:

Summary

Work done so far

  • Found out there's an issue with llvm3.8 and earlier where some of the __sync_ methods hang on certain ARM64 architectures. LLVM 25526.
    • Workaround: Compile on Ubuntu zesty, which ships with llvm-3.9,
    • Workaround: use the pal_sync functions created as a workaround.
  • lldb hangs on Ubuntu Zesty. Workaround: compile lldb from source
  • The lldb plugin wasn't building for ARM64. Fixed in coreclr master.
  • Attaching a debugger to a process running in a Docker container is not straightfoward. Current workaround: don't use docker containers

Current Issue - Summary

Issue

A Hello World application (coreapp.dll) errors out with a sigsev

Compiling  629 System.ParamsArray::.ctor, IL size = 34, hsh=0x971f5e3b
Process 65951 stopped
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSEGV: invalid address (fault address: 0x8)
    frame #0: 0x0000ffff3d61efec
->  0xffff3d61efec: ldrsw  x0, [x0, #0x8]
    0xffff3d61eff0: str    w0, [x29, #0x1c]
    0xffff3d61eff4: nop    
    0xffff3d61eff8: b      0xffff3d61effc

Managed stack trace

The get_Length() method in ParamsArray uses an array, args, of the ParamsArray struct. This struct wraps around an object[] array, args. For arrays of length 1-3, the values are cached in the arg1-arg3 fields, and a args is set to a dummy array which contains 1-3 objects. These dummy arrays are stored in static fields of the ParamsArray struct. This dummy array is used only when using the Length property. See ParamsArray for more info

Steps to reproduce

Use install.sh to set up a Ubuntu zesty environment

Run /home/lldb/build/bin/lldb /home/coreapp/corerun /home/coreapp/coreapp.dll to launch CoreCLR under the debugger and troubleshoot.

Hypothesis

  • The static fields are never initialized - the static constructor fails to run. The debugger showed these fields as NotInit. However, the static constructor runs and these fields do have the correct values.
  • The constructor fails to copy the static threeArgArray to the args field. Also proven false - the values are copied correctly
  • When ParamsArray is passed from String.Format to String.FormatHelper, somehow the struct is not passed correctly and the args field is set to zero. Under investigation.

@qmfrederik
Copy link
Contributor Author

@janvorli By setting the breakpoint in Format after the call to ParamsArray..ctor, I was able to step through the code until the null reference error occurs.

I noticed a lot of the code was in src/vm/arm64/*.S files and they appeared to be out of sync with their .asm equivalents.

I tried to port some of the fixes applied to the .asm files to the .S files and the Hello, World program now works.

The changes are in https://github.com/qmfrederik/coreclr/commit/6e245a6925f2ae2e954c0381f5eeaca0114670c3, I'll try to make PRs which match the original PRs

@janvorli
Copy link
Member

janvorli commented Feb 6, 2017

@qmfrederik great job spotting and fixing the issues in the asm helpers. This is a great milestone!

@qmfrederik
Copy link
Contributor Author

@janvorli Thanks for your patience and assistance as well, couldn't have done it without!

So, this leaves me thinking "what's next". In the end, we have a fairly large .NET Core application that we'd like to get running on ARM64 and we're not there yet :)

Things I can think of as a next step:

  • Getting exception handling to work (the 2 PAL tests which are failing due to ThrowExceptionFromContextInternal not being implemented on arm64). I don't think I can do the implementation myself but would be happy to test.
  • Running the full CoreCLR test suite, and getting CoreCLR arm64 on Linux on par with arm64 Windows.
  • Open a new issue (let's leave this one for "Getting Hello World" to run, with > 200 comments it's becoming a bit slow).

What do you think? Are there other things we should test/try at this point?

@janvorli
Copy link
Member

janvorli commented Feb 6, 2017

@qmfrederik I agree that we should close this issue as the initial phase of the bringup is complete. Before moving to coreclr tests, we need to make the exception handling work.
We need to make sure software exception handling works first, since if that doesn't work, the hardware exception handling would not work either. You can start with a very simple test like this:

    static void M1()
    {
        throw new Exception("e");
    }

    static void Main() 
    {
        try
        {
            M1();
        }
        catch (Exception ex)
        {
            Console.WriteLine("Caught exception {0}", ex);
        }
    }

If that works, we can move to more complex scenarios. I have a bunch of simple hand written tests that I was using when bringing up the exception handling for Unix x64 at the beginning of the CoreCLR porting and I can share those with you.

To enable hardware exception handling, we need to implement the ThrowExceptionFromContextInternal. None of the tests that test null reference exceptions, division by zero or similar would pass without it. I will write the implementation for you.

@qmfrederik
Copy link
Contributor Author

@janvorli Thx! I'll give it a try and get back to you. I've opened dotnet/coreclr#9370 to track the exception handling progress.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-coreclr os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

5 participants