You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add target features for sm_* and ptx*, both of which form a partial
order, but cannot be combined to a single partial order. These mirror
the LLVM target features, but we do not provide LLVM target
processors (which imply both an sm_* and ptx* feature).
Add some documentation for the nvptx target.
It is necessary to specify the target, such as `-C target-cpu=sm_89`. This implies two target features: `sm_89` and `ptx78` (and all preceding features within `sm_*` and `ptx*`). Rust will default to using the oldest PTX version that supports the target processor (see [this table](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#release-notes-ptx-release-history)), which maximizes driver compatibility.
30
+
One can use `-C target-feature=+ptx80` to choose a later PTX version without changing the target (the default `ptx78` requires CUDA driver version 11.8, while `ptx80` would require driver version 12.0).
31
+
32
+
Although `ptx*` is represented as a target feature, it is a compile-time property and it is not possible to build a crate that uses instructions not present in the PTX version specified at compile-time (either via `target-cpu` or `target-feature`).
33
+
For example, consider an unaligned barrier `barrier.sync`, which requires both `sm_70` and `ptx60`.
34
+
If one wants to support building for older devices (e.g., `-C target-cpu=sm_62`; ensuring that this unaligned barrier is unreachable at run-time on such devices), the relevant function could use attributes:
35
+
```
36
+
#[cfg(target_feature = "ptx60")]
37
+
#[target_feature(enable = "sm_70")]
38
+
```
39
+
40
+
## Building Rust kernels
41
+
42
+
A `no_std` crate containing one or more functions with `extern "ptx-kernel"` can be compiled to PTX using a command like the following.
0 commit comments