Skip to content

Commit 1c9dea6

Browse files
committed
improve docs
1 parent faf4855 commit 1c9dea6

File tree

2 files changed

+58
-78
lines changed

2 files changed

+58
-78
lines changed

cuda_core/cuda/core/experimental/_linker.py

Lines changed: 53 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -77,118 +77,92 @@ def _lazy_init():
7777

7878
@dataclass
7979
class LinkerOptions:
80-
"""Customizable :obj:`LinkerOptions` for nvJitLink or driver API. Some options are only available
81-
whenusing the cuda.bindings.nvjitlink backend. Some options are only available when using newer
82-
or older versions of cuda.
80+
"""Customizable :obj:`Linker` options.
8381
82+
Since the linker would choose to use nvJitLink or the driver APIs as the linking backed,
83+
not all options are applicable.
8484
8585
Attributes
8686
----------
8787
arch : str
88-
Pass SM architecture value. Can use compute_<N> value instead if only generating PTX.
88+
Pass the SM architecture value, such as ``-arch=sm_<CC>`` (for generating CUBIN) or
89+
``compute_<CC>`` (for generating PTX).
8990
This is a required option.
90-
Acceptable value type: str
91-
Maps to: -arch=sm_<N>
9291
max_register_count : int, optional
9392
Maximum register count.
94-
Default: None
95-
Acceptable value type: int
96-
Maps to: -maxrregcount=<N>
93+
Maps to: ``-maxrregcount=<N>``.
9794
time : bool, optional
98-
Print timing information to InfoLog.
99-
Default: False
100-
Acceptable value type: bool
101-
Maps to: -time
95+
Print timing information to the info log.
96+
Maps to ``-time``.
97+
Default: False.
10298
verbose : bool, optional
103-
Print verbose messages to InfoLog.
104-
Default: False
105-
Acceptable value type: bool
106-
Maps to: -verbose
99+
Print verbose messages to the info log.
100+
Maps to ``-verbose``.
101+
Default: False.
107102
link_time_optimization : bool, optional
108103
Perform link time optimization.
109-
Default: False
110-
Acceptable value type: bool
111-
Maps to: -lto
104+
Maps to: ``-lto``.
105+
Default: False.
112106
ptx : bool, optional
113-
Emit PTX after linking instead of CUBIN; only supported with -lto.
114-
Default: False
115-
Acceptable value type: bool
116-
Maps to: -ptx
107+
Emit PTX after linking instead of CUBIN; only supported with ``-lto``.
108+
Maps to ``-ptx``.
109+
Default: False.
117110
optimization_level : int, optional
118111
Set optimization level. Only 0 and 3 are accepted.
119-
Default: None
120-
Acceptable value type: int
121-
Maps to: -O<N>
112+
Maps to ``-O<N>``.
122113
debug : bool, optional
123114
Generate debug information.
124-
Default: False
125-
Acceptable value type: bool
126-
Maps to: -g
115+
Maps to ``-g``
116+
Default: False.
127117
lineinfo : bool, optional
128118
Generate line information.
129-
Default: False
130-
Acceptable value type: bool
131-
Maps to: -lineinfo
119+
Maps to ``-lineinfo``.
120+
Default: False.
132121
ftz : bool, optional
133122
Flush denormal values to zero.
134-
Default: False
135-
Acceptable value type: bool
136-
Maps to: -ftz=<n>
123+
Maps to ``-ftz=<n>``.
124+
Default: False.
137125
prec_div : bool, optional
138126
Use precise division.
139-
Default: True
140-
Acceptable value type: bool
141-
Maps to: -prec-div=<n>
127+
Maps to ``-prec-div=<n>``.
128+
Default: True.
142129
prec_sqrt : bool, optional
143130
Use precise square root.
144-
Default: True
145-
Acceptable value type: bool
146-
Maps to: -prec-sqrt=<n>
131+
Maps to ``-prec-sqrt=<n>``.
132+
Default: True.
147133
fma : bool, optional
148134
Use fast multiply-add.
149-
Default: True
150-
Acceptable value type: bool
151-
Maps to: -fma=<n>
135+
Maps to ``-fma=<n>``.
136+
Default: True.
152137
kernels_used : List[str], optional
153138
Pass list of kernels that are used; any not in the list can be removed. This option can be specified multiple
154139
times.
155-
Default: None
156-
Acceptable value type: list of str
157-
Maps to: -kernels-used=<name>
140+
Maps to ``-kernels-used=<name>``.
158141
variables_used : List[str], optional
159-
Pass list of variables that are used; any not in the list can be removed. This option can be specified multiple
160-
times.
161-
Default: None
162-
Acceptable value type: list of str
163-
Maps to: -variables-used=<name>
142+
Pass a list of variables that are used; any not in the list can be removed.
143+
Maps to ``-variables-used=<name>``
164144
optimize_unused_variables : bool, optional
165145
Assume that if a variable is not referenced in device code, it can be removed.
166-
Default: False
167-
Acceptable value type: bool
168-
Maps to: -optimize-unused-variables
146+
Maps to: ``-optimize-unused-variables``
147+
Default: False.
169148
xptxas : List[str], optional
170-
Pass options to PTXAS. This option can be called multiple times.
171-
Default: None
172-
Acceptable value type: list of str
173-
Maps to: -Xptxas=<opt>
149+
Pass options to PTXAS.
150+
Maps to: ``-Xptxas=<opt>``.
174151
split_compile : int, optional
175152
Split compilation maximum thread count. Use 0 to use all available processors. Value of 1 disables split
176153
compilation (default).
177-
Default: 1
178-
Acceptable value type: int
179-
Maps to: -split-compile=<N>
154+
Maps to ``-split-compile=<N>``.
155+
Default: 1.
180156
split_compile_extended : int, optional
181157
A more aggressive form of split compilation available in LTO mode only. Accepts a maximum thread count value.
182158
Use 0 to use all available processors. Value of 1 disables extended split compilation (default). Note: This
183159
option can potentially impact performance of the compiled binary.
184-
Default: 1
185-
Acceptable value type: int
186-
Maps to: -split-compile-extended=<N>
160+
Maps to ``-split-compile-extended=<N>``.
161+
Default: 1.
187162
no_cache : bool, optional
188163
Do not cache the intermediate steps of nvJitLink.
189-
Default: False
190-
Acceptable value type: bool
191-
Maps to: -no-cache
164+
Maps to ``-no-cache``.
165+
Default: False.
192166
"""
193167

194168
arch: str
@@ -351,8 +325,11 @@ def _exception_manager(self):
351325

352326

353327
class Linker:
354-
"""
355-
Linker class for managing the linking of object codes with specified options.
328+
"""Represent a linking machinery to link one or multiple object codes into
329+
:obj:`~cuda.core.experimental._module.ObjectCode` with the specified options.
330+
331+
This object provides a unified interface to multiple underlying
332+
linker libraries (such as nvJitLink or cuLink* from CUDA driver).
356333
357334
Parameters
358335
----------
@@ -442,7 +419,7 @@ def link(self, target_type) -> ObjectCode:
442419
443420
Note
444421
------
445-
See nvrtc compiler options documnetation to ensure the input ObjectCodes are
422+
See nvrtc compiler options documnetation to ensure the input object codes are
446423
correctly compiled for linking.
447424
"""
448425
if target_type not in ("cubin", "ptx"):
@@ -470,7 +447,8 @@ def get_error_log(self) -> str:
470447
471448
Returns
472449
-------
473-
The error log.
450+
str
451+
The error log.
474452
"""
475453
if _nvjitlink:
476454
log_size = _nvjitlink.get_error_log_size(self._mnff.handle)
@@ -485,7 +463,8 @@ def get_info_log(self) -> str:
485463
486464
Returns
487465
-------
488-
The info log.
466+
str
467+
The info log.
489468
"""
490469
if _nvjitlink:
491470
log_size = _nvjitlink.get_info_log_size(self._mnff.handle)
Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,21 @@
11
# `cuda.core` Release notes
22

3-
Released on Nov <TODO>, 2024
3+
Released on Dec XX, 2024
44

55
## Hightlights
66

77
- Add `StridedMemoryView` and `@args_viewable_as_strided_memory` that provide a concrete
88
implementation of DLPack & CUDA Array Interface supports.
9-
- Addition of the Linker class which gives object oriented and pythonic access to the nvJitLink or cuLink API
10-
depending on your CUDA version.
9+
- Add `Linker` that can link one or multiple `ObjectCode` instances generated by `Program`s. Under
10+
the hood, it uses either the nvJitLink or cuLink APIs depending on the CUDA version detected
11+
in the current environment.
1112
- Support TCC devices with a default synchronous memory resource to avoid the use of memory pools
1213

1314

1415
## Limitations
1516

1617
- All APIs are currently *experimental* and subject to change without deprecation notice.
1718
Please kindly share your feedbacks with us so that we can make `cuda.core` better!
18-
- Some LinkerOptions are only available when using a modern version of CUDA. When using CUDA <12,
19+
- Some `LinkerOptions` are only available when using a modern version of CUDA. When using CUDA <12,
1920
the backend is the cuLink api which supports only a subset of the options that nvjitlink does.
2021
Further, some options aren't available on CUDA versions <12.6

0 commit comments

Comments
 (0)