Skip to content

GHC binaries from haskell.nix don't work on aarch64 #621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TravisWhitaker opened this issue May 20, 2020 · 16 comments
Closed

GHC binaries from haskell.nix don't work on aarch64 #621

TravisWhitaker opened this issue May 20, 2020 · 16 comments

Comments

@TravisWhitaker
Copy link
Contributor

I believe that haskell.nix is patching GHC (or making some other change vs. upstream) in a way that impacts memory safety on aarch64.

With minor changes, haskell.nix can be used to build e.g. an 8.8.3 bootstrapped with 8.8.2 on aarch64: #620

Building this way with the branch from that linked PR:

nix-build -E "with import ../nixpkgs-channels/default.nix (import ./. {}).nixpkgsArgs; haskell-nix.compiler.ghc883"

Where ../nixpkgs-channels is pointed at current nixos-20.03, the build fails at the very end with this:

ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
ghc-stage2: internal error:     Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
ghc-stage2: internal error: evacuate: strange closure type -1860476400
    (GHC version 8.8.3 for aarch64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
make[1]: *** [utils/haddock/ghc.mk:20: utils/haddock/dist/build/Paths_haddock.dyn_o] Aborted (core dumped)

This occurs at the very end, where stage 2 is being used to build haddock. I initially suspected some memory barrier regressions in the 8.8 branch (and opened this https://gitlab.haskell.org/ghc/ghc/issues/18201), but I've failed to reproduce this issue with vanilla upstream 8.8.3.

I'm sifting through the build.mk and configure flags that haskell.nix chooses to see if I work out what might be causing this, next will be sifting through the patches that are applied.

@TravisWhitaker TravisWhitaker changed the title GHC binaries from haskell.nix don't work well on aarch64 GHC binaries from haskell.nix don't work on aarch64 May 20, 2020
@angerman
Copy link
Collaborator

@TravisWhitaker try disabling -fast-llvm

@TravisWhitaker
Copy link
Contributor Author

Looks like this is due to “-fast-llvm.” If I remove it from the generated build.mk, I get a GHC 8.8.3 that works just fine.

@angerman
Copy link
Collaborator

@TravisWhitaker did you have -fast-llvm and -fno-plt everywhere?

@angerman
Copy link
Collaborator

@TravisWhitaker there is also -dno-llvm-mangler. So -fllvm -dno-llvm-mangler should do .ll -> .S -> .o. If that also doesn't crash this would be rather impressive.

@angerman
Copy link
Collaborator

Again, -fast-llvm can only work

  • with -fno-plt
  • without avx calls
  • without -deadstrip on macOS.

@TravisWhitaker
Copy link
Contributor Author

TravisWhitaker commented May 20, 2020

@angerman What I just tried on a hunch was my branch here with the bit here that adds -fast-llvm to GhcStage2HcOpts and GhcLibHcOpts commented out.

Looking at compiler/ghc/default.nix, I think we're breaking your first rule when building GHC itself; I don't see -fno-plt anywhere in the build.mk or configure flags we use.

@TravisWhitaker
Copy link
Contributor Author

Should we build GHC itself with -fno-plt, or should we build it without -fast-llvm?

@angerman
Copy link
Collaborator

-fast-llvm + -fno-plt or no -fast-llvm and using the Mangler. I'd absolutely be for the first appraoch, as it allows LLVM to work off of the higher level LL instead of the lowered assembly.

@TravisWhitaker
Copy link
Contributor Author

I’ll give that a go and see if we get a working aarch64 GHC.

@angerman
Copy link
Collaborator

to validate it's the function -> object rewrite, you could build without -fast-llvm but with -dno-llvm-mangler.

@angerman
Copy link
Collaborator

Actually, I think this might be more tricky than using -fno-plt. -fno-plt would only apply to $CC but not $HC. For $HC with -fllvm we need to make the emitted .ll contain nonlazybind attributes.

@angerman
Copy link
Collaborator

@angerman
Copy link
Collaborator

Here's a rough patch of what the llvm codegen probably needs to do: https://gist.github.com/angerman/0dbc0b6587ed4d3ca01299924d26c255

@TravisWhitaker
Copy link
Contributor Author

Can we disable -fast-llvm until that's sorted? It definitely doesn't yield correct code on aarch64.

@bgamari
Copy link
Contributor

bgamari commented May 20, 2020

Yes, I would agree with @TravisWhitaker; the solution here seems obvious: disable -fast-llvm. As I've previously noted, it really isn't safe in its current incarnation. Perhaps we can consider introducing something like it into a future release (c.f. GHC #18179) but using the variant in 8.8 and earlier is only going to cause more pain for everyone.

@angerman
Copy link
Collaborator

Can we disable -fast-llvm until that's sorted? It definitely doesn't yield correct code on aarch64.

@TravisWhitaker yes we should drop it. Even though I remain highly skeptical about using the Mangler, and using two different toolchains for code gen and assembly. However as this seems to be rather poorly supported in llvm, even nonlazybind is not guaranteed to be respected by every lowering backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants