-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/link: openshift build fails due to another problem with too far branches on ppc64le #20497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The failure occurs the first time main.init tries to call a plt_branch sequence like this: 000000001380448c <00000038.plt_branch.36:b9ef>: But even though this is a dynamic executable built by the GNU linker, the code is not generated to load R2 or maintain it. As a result, this sequence expects R2 to be valid when it is not, and jumps off to a bad address, or 0. This is the first time I've seen a plt_branch to handle this situation, if the branch offset is not too large then it is just a long_branch and that works fine. I am able to build valid binaries successfully by using -buildmode=pie. In reading the PPC64 v2 ABI, it sounds like the linker should know whether or not the procedure has initialized R2 and generate the call stub based on that information? If that is true then it shouldn't be creating this type of call stub unless it doesn't have correct information about the procedure making the call. I'm trying to track that down. |
I have hit this too(Fedora27/26, go1.8.3), with bins(origin/genman,origin/openshift) ~300M+ in size and was able to get (at least partially) functional binaries using linkmode internal(for non shared bins, haven't tried shared at all), but I haven't verified that binaries are generally correct(yet). They don't segfault when executed(which seems to be according to your notes). |
The workaround for this is to use -bulidmode=pie when building the executables that are nonstatic. If there are static binaries being built, those shouldn't have a problem because they should still work with internal linking if they are big. I've tested with it and seems to work fine. Using -buildmode=pie will ensure that r2 is valid as needed. A better fix is planned for Go 1.9 and is being tested to verify no regressions are introduced. |
CL https://golang.org/cl/45130 mentions this issue. |
Actually internal linker is not a workaround(at least not in full static Go/default build, surprisingly it is when you have intermediate files left from failed build with external linker, haven't investigated further this path yet), it fails with relocation overflow. With build mode pie(without -linkshared) and go1.8.3 I'm now hitting illegal instruction in genman, investigating further. seems as jump to ctr which point in to data:
|
@jcajka Can you provide directions on how you build genman? This was reported to me when building openshift using the directions above, and I don't see how to build genman. If I build openshift with -buildmode=pie using go 1.8.3 it works AFAICT, and it certainly doesn't die in __libc_start_main. Maybe you noticed, the upstream patch was submitted this morning, and we are testing the go 1.8.3 patch now so that should be available very soon. |
I will be changing my solution for this problem as a result of the discussion in #20492. |
@laboger genman is built using hack/generate-docs.sh, I did some tracing and it seem that following will build genman(invoked in tree of failed build of OS).
(path adjustment needed)
Dropping the linkshared part will result in the up-mention illegal instruction. |
A few notes on the workaround. Most of my testing had been done upstream, and I made the wrong assumption that go 1.8 would behave the same. But I've since discovered an upstream fix (not in go 1.8) that is needed to make this workaround work in all cases. Sorry about that confusion. @jcajka |
Change https://golang.org/cl/70837 mentions this issue: |
…th ext linking When using golang on ppc64le there have been issues when building executables that generate extremely large text sections. This is due to the call instruction and the limitation on the offset field, which is smaller than most platforms. If the size of the call target offset is too big for the offset field in the call instruction, then link errors can occur. The original solution to this problem in golang was to split the text section when it became too large, allowing the external (GNU) linker to insert the necessary stub to handle the long call. That worked fine until the another size limit for the program size was hit, where a plt_branch was created instead of a long branch. In that case the plt_branch code sequence expects r2 to contain the address of the TOC, but when golang creates dynamic executables by default (-buildmode=exe) r2 does not always contain the address of the TOC and as a result when building programs that reach this extremely large size, a runtime SEGV or SIGILL can occur due to branching to a bad address. When using internal linking, trampolines are generated to handle the long calls but the text sections are not split. With this change, text sections will still be split approrpriately with external linking but if the buildmode being used does not maintain r2 as the TOC addresses, then trampolines will be created for those calls. Fixes #20497 Change-Id: If5400b0f86c2c08e106b332be6db0b259b07d93d Reviewed-on: https://go-review.googlesource.com/45130 Run-TryBot: Lynn Boger <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]> Reviewed-on: https://go-review.googlesource.com/70837 Run-TryBot: Russ Cox <[email protected]> Reviewed-by: Lynn Boger <[email protected]>
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go 1.9
What operating system and processor architecture are you using (
go env
)?Ubuntu 16.04
What did you do?
Built upstream openshift binaries, which builds using packages from Kubernetes.
What did you expect to see?
Binaries built that work.
What did you see instead?
Some binaries now fail due to a new linker limit being exceeded, which causes the GNU linker to generate plt stubs for long calls. The stub generated in this case by the GNU linker expects that R2 contains the address of the TOC (according to the PPC64LE v2 ABI) but code generated by golang in this case does not maintain that value in R2. As a result the generated address is bad and attempts to branch to a bad address.
I know trying to set up and maintain R2 in this case is not the best solution. I believe the solution as requested here #17917 is the right answer but I need to test it out more. (I had some concerns about this solution which is why I wasn't able to get it into go 1.9).
To reproduce this, clone the repo from here:
https://github.com/openshift/origin.git
cd origin; make all
Try to run one of the tests in:
_output/local/bin/linux/ppc64le with the -h option and the result is a SEGV.
The text was updated successfully, but these errors were encountered: