Skip to content

Illegal instructions generated for Arm Cortex-R5 processor when compiling code using floating-point operations #128448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rikyborg opened this issue Jul 31, 2024 · 19 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-medium Medium priority

Comments

@rikyborg
Copy link

When targeting the Arm Cortex-R5 processor, rustc and llvm generate assembly containing floating-point instructions. These are not available on the Cortex-R5 (only on the Cortex-R5F) and cause the processor to halt.

Details

For example, the following code:

fn add_s(a: f32, b: f32) -> f32 {
    a + b
}

fn add_d(a: f64, b: f64) -> f64 {
    a + b
}

compiled with flags --target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 generates the assembly (compiler explorer link):

add_s:
        vmov    s0, r1
        vmov    s2, r0
        vadd.f32        s0, s2, s0
        vmov    r0, s0
        bx      lr

add_d:
        vmov    d0, r2, r3
        vmov    d1, r0, r1
        vadd.f64        d0, d1, d0
        vmov    r0, r1, d0
        bx      lr

Note the vadd.f32 and vadd.f64 instructions, that are available on the Cortex-R5F which has an FPU, but that are illegal instructions on the Cortex-R5 without an FPU.

My expectation is that, using the target armv7r-none-eabi (rather than armv7r-none-eabihf) and the target CPU cortex-r5 (rather than cortex-r5f), rustc and llvm would generate legal instructions for the processor, i.e. use software floating-point features rather than hard-float instructions.

Relevant information

Original thread on URLO: Unexpected codegen for Cortex-R5 without FPU.

The upstream LLVM CPU model for cortex-r5 includes the flag FeatureVFP3_D16, which seems OK for the R5F but wrong for the R5. However, manually disabling that feature doesn't seem to help (see below). Link to llvm repo at tag 18.1.7.

Information from the ARM Cortex-R Series Programmer's Guide: 6.1.6. VFP in the Cortex-R processors.

Things that do work

Skipping the target-cpu flag

Compiling the snippet above with flags --target armv7r-none-eabi -C opt-level=3 generates:

add_s:
        push    {r11, lr}
        bl      __aeabi_fadd
        pop     {r11, pc}

add_d:
        push    {r11, lr}
        bl      __aeabi_dadd
        pop     {r11, pc}
Adding the soft-float target feature

Compiling the snippet above with flags --target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=+soft-float generates:

add_s:
        push    {r11, lr}
        bl      __aeabi_fadd
        pop     {r11, pc}

add_d:
        push    {r11, lr}
        bl      __aeabi_dadd
        pop     {r11, pc}

However, rustc generates the warning:

warning: unknown and unstable feature specified for `-Ctarget-feature`: `soft-float`
  |
  = note: it is still passed through to the codegen backend, but use of this feature might be unsound and the behavior of this feature can change in the future
  = help: consider filing a feature request

warning: 1 warning emitted

Things that don't work

Removing the vfp3d16 target feature

Compiling the snippet above with flags --target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=-vfp3d16 generates:

add_s:
        vmov    s0, r1
        vmov    s2, r0
        vadd.f32        s0, s2, s0
        vmov    r0, s0
        bx      lr

add_d:
        vmov    d0, r2, r3
        vmov    d1, r0, r1
        vadd.f64        d0, d1, d0
        vmov    r0, r1, d0
        bx      lr

Meta

I've tested both stable and nightly toolchains, with same results.

rustc --version --verbose:

rustc 1.80.0 (051478957 2024-07-21)
binary: rustc
commit-hash: 051478957371ee0084a7c0913941d2a8c4757bb9
commit-date: 2024-07-21
host: x86_64-unknown-linux-gnu
release: 1.80.0
LLVM version: 18.1.7
@rikyborg rikyborg added the C-bug Category: This is a bug. label Jul 31, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 31, 2024
@tgross35
Copy link
Contributor

tgross35 commented Jul 31, 2024

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there? Here is an llc reproduction that shows cortex-r5 is the cause https://llvm.godbolt.org/z/zzK7KGv95

@tgross35 tgross35 added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Jul 31, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Jul 31, 2024
@tgross35 tgross35 added I-prioritize Issue: Indicates that prioritization has been requested for this issue. and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jul 31, 2024
@tgross35
Copy link
Contributor

Whoops, didn't mean to remove prioritize but I think the labels raced.

@workingjubilee
Copy link
Member

cc @chrisnc as the maintainer for this target.

We updated the float support in #123159 to be more-correct for this target but it looks like we didn't quite stick the landing I guess?

Processors in this family include the Arm Cortex-R4, 5, 7, and 8.

We most definitely intentionally included the R5 here, though apparently we assume an R5F.

@chrisnc
Copy link
Contributor

chrisnc commented Aug 1, 2024

LLVM's policy for Arm target-cpu is that it will enable the maximal set of features that the chosen CPU might support, and expects users to disable ones that they don't have/want, so it is correct and expected that the compiler assumes vfp3d16 when you specify cortex-r5, even when using the non-hf target. The term "R5F" is just a shorthand for "a Cortex-R5 with FPU support", and for LLVM this is the intended default. Arm doesn't define it as a separate CPU, nor does LLVM. (Edit: looks like R4(F) is an exception to this in LLVM.)
Note also that the examples still take the f32 arguments in the integer registers, which doesn't happen on the hf target, so overall the code generated is correct. The only thing I don't understand is why target-feature=-vfp3d16 did not work as expected, so I will investigate that , and see if this is new as of #123159 or was always this way. Edit: no, this can't be affected by that change, because we only modified the hf targets to use slightly different default features so they would compose better with target-cpu, but this case is about the non-hf targets. Still checking on the behavior there...

@chrisnc
Copy link
Contributor

chrisnc commented Aug 1, 2024

--target armv7r-none-eabi -C opt-level=3 -C target-cpu=cortex-r5 -C target-feature=-vfp2sp,-fp64

https://godbolt.org/z/cj38s8Ev6

This will also disable the floating-point support. I think the issue is that the implied +vfp3d16 from cortex-r5 causes multiple other features to be enabled, but -vfp3d16 doesn't disable those dependencies. Just using +soft-float in your project is still probably the right approach here, rather than referring to the VFP feature tree internal to LLVM.

@rikyborg
Copy link
Author

rikyborg commented Aug 1, 2024

Thanks a lot @chrisnc for investigating this issue!

Just using +soft-float in your project is still probably the right approach here, rather than referring to the VFP feature tree internal to LLVM.

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for `-Ctarget-feature`: `soft-float`. Is there any way to suppress that?

Or should the non-hf target already include the +soft-float feature? If the target does not have hard-float support, is there any reason to generate code using floating-point instructions? (I might be missing something here...)

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there?

Should I still go ahead and file an issue with LLVM as suggested initially by @tgross35?

One source of confusion for me is that LLVM seems to distinguish between Cortex-R4 and Cortex-R4F, but not between Cortex-R5 and Cortex-R5F.

@apiraino
Copy link
Contributor

apiraino commented Aug 1, 2024

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

@rustbot rustbot added P-medium Medium priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Aug 1, 2024
@chrisnc
Copy link
Contributor

chrisnc commented Aug 1, 2024

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

I don't think there's a way to disable this right now except to use nightly. I can't find an open issue for stabilizing these right now, but that is the path. This warning is relatively recent, and this issue gives some explanation of why it's there: #117616.

Or should the non-hf target already include the +soft-float feature? If the target does not have hard-float support, is there any reason to generate code using floating-point instructions? (I might be missing something here...)

The hf-ness is about the ABI, not what the target could support if the user asks for it (which was the case here). It's perfectly valid to use the soft-float ABI on a core that has an FPU, and the compiler can use the VFP instructions for things other than computation on f32/f64 (or you might legitimately need to do floating-point computation while calling/being called by code that uses the soft-float ABI because you can't re-compile it). This is consistent with how clang/llvm handle this; the expectation is that you use a combination of target-cpu+target-feature to specify what you want, rather than having the ABI choice forcibly disable some features.

As mentioned on the thread, this seems to come from LLVM. Would you mind filing an issue there?
Should I still go ahead and file an issue with LLVM as suggested initially by @tgross35?

One source of confusion for me is that LLVM seems to distinguish between Cortex-R4 and Cortex-R4F, but not between Cortex-R5 and Cortex-R5F.

I think that would be worthwhile, at least to understand why there is a separate cortex-r4f but not for the others. I will speculate that this is for historical reasons and that they would remove it if not for backward compatibility, because it is inconsistent. Anecdotally, I think nofp cortex-r4 are much more common than nofp cortex-r5, which might also explain the exception, but I don't know the provenance of the one you are using. When you are using clang, you would use -mcpu=cortex-r5+nofp for your case, which obviates the need for a separate CPU name or explicitly disabling exactly the FPU features that are included with cortex-r5.

@workingjubilee
Copy link
Member

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

We should probably just add this feature.

@workingjubilee
Copy link
Member

@chrisnc For note, LLVM seems to have flipflopped on the maximal vs. minimal thing for Arm CPUs at least twice now, and recent Arm CPUs (aarch64, mostly) are more likely to use a minimal featureset, though that may only be true from Armv9 onwards.

@chrisnc
Copy link
Contributor

chrisnc commented Aug 2, 2024

Yes, right now that seems the clearest approach. There's the downside that rustc will keep throwing the warning unknown and unstable feature specified for -Ctarget-feature: soft-float. Is there any way to suppress that?

We should probably just add this feature.

Whoops, I missed that this was also unknown, not just unstable, when I wrote my comment. I'm not sure this will be possible though, because it affects the ABI, and the reasoning here applies to Arm also.

Should rustc consider something like clang's -mcpu which allows things like cortex-r5+nofp? I'm not sure if this is exposed in a re-usable way by llvm, but it could go a long way in solving these types of problems.

For note, LLVM seems to have flipflopped on the maximal vs. minimal thing for Arm CPUs at least twice now, and recent Arm CPUs (aarch64, mostly) are more likely to use a minimal featureset, though that may only be true from Armv9 onwards.

Interesting... At least for the v7m, v7r, v8m, and v8r cores, the default feature sets have all been maximal when each core was added and then not changed except for refactoring and bug fixes from what I can see, but A-profile and Aarch64 are quite a bit more varied and may have different considerations. It seems that R4 as distinct from R4F was done originally in GCC in 2008 and then LLVM followed it, but no other exceptions exist, even though "M4F" and "R5F" and others are commonly mentioned when a non-trivial population of them are nofp.

@workingjubilee workingjubilee added I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness and removed I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness labels Oct 2, 2024
@thejpster
Copy link
Contributor

Just saw this and wanted to note that it's documented for the M-Profile Arm targets in some detail in the Platform Docs. Might be worth updating the R-Profile target docs in the same manner.

https://doc.rust-lang.org/nightly/rustc/platform-support/thumbv7em-none-eabi.html#target-cpu-and-target-feature-options

I don't think it's a bug in LLVM, just a policy decision. The solution is to identify, and then stablise, the features that turn off the undesired instructions. Or to not specify 'target-cpu' and rely on the baseline architecture support, which is probably fine in many cases.

@chrisnc
Copy link
Contributor

chrisnc commented Feb 5, 2025

I've refrained from doing this because I don't think it's the right path long term. Editing feature lists is not something that LLVM expects frontend users to do (e.g., it is not possible recommended or documented) to do this directly in clang, and various unexpected things can happen when you do it, with little to no assistance provided by LLVM to protect you, see: llvm/llvm-project#111334). Not to mention you get a warning from Rust that it doesn't know about the feature you're talking about and it will pass it to LLVM, and any such feature that can possibly change the ABI (like -fpregs), cannot be safely promoted to a target-feature rust knows about or stabilized as such, so this approach is essentially perma-unstable. For these cases, I believe the correct thing for the Rust project to do is to delegate the full target-cpu parsing to LLVM's Arm target CPU parser, so you can say (taking this issue as an example): -Ctarget-cpu=cortex-r5+nofp. Right now this is an error in Rust, but the code for parsing this CPU string exists in LLVM and it's what clang (and gcc) expect you to do to specify this variant of Cortex-R5.

@thejpster
Copy link
Contributor

I agree that that's a better solution, but I didn't like the previous state of "just can't be done".

@chrisnc
Copy link
Contributor

chrisnc commented Feb 6, 2025

I think a better recommendation in the current state of affairs is to provide a custom target json with the features you want. This has the benefit of not emitting warnings at build time, and also avoids any mismatch between the features/ABI that core uses vs. that of the user's code.

@thejpster
Copy link
Contributor

Sure, but that requires using the nightly toolchain.

I think the ABI incompatibility issues are being addressed - AIUI flags that change the ABI will get special treatment (or be rejected). And none of the ones I suggest in the docs do so - to the best of my knowledge.

@chrisnc
Copy link
Contributor

chrisnc commented Feb 10, 2025

The specific combinations in the docs are okay as of now, but the general practice of using -fpregs is questionable. It's still TBD whether incorrect usage of this flag will be treated as an error or warning by LLVM (see that PR for the back and forth). Right now it generates code for one ABI but with incorrect object metadata that claims it uses a different ABI, which is a symptom of this interface not really being intended for consumption by end users.

@thejpster
Copy link
Contributor

If you use it with a hard-float target, it would definitely be bad. But the platform docs only say to use it with a soft-float EABI target, and only when you've selected a target-cpu that turns on FPU support by default. Looking at the code that comes out of rustc, I can't see any metadata differences; although maybe I should decode the .ARM.attributes section and see what's in there?

@chrisnc
Copy link
Contributor

chrisnc commented Feb 11, 2025

Already acknowledged that "the specific combinations in the docs are okay as of now", but the point is just that there is very little to protect you if you stray from that even slightly, so I'm worried about users reading these docs and coming away with the impression that fiddling with feature lists of the built-in targets is a stable and well-supported way to specify a compilation target in general, when it's not.

readelf -a will show the misinformation that LLVM generates when compiling for eabihf without fpregs (arguably the metadata is correct and it's the codegen that is incorrectly ignoring the ABI). This even leads to linkers being unable to detect the mismatch and report an error, which ordinarily they are able to do with translation units that are compiled correctly but with incompatible ABIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-miscompile Issue: Correct Rust code lowers to incorrect machine code I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-medium Medium priority
Projects
None yet
Development

No branches or pull requests

7 participants