Description
Consider the following bit of code:
template <class _Elem>
class codecvt {
static int id;
};
template <class _Elem>
int codecvt<_Elem>::id;
template class codecvt<char>;
Including this into a CUDA file (you don't even have to use the class in a device-side function or variable) leads to warnings like ptxas warning : Unresolved extern variable '_ZN7codecvtIcE2idE' in whole program compilation, ignoring extern qualifier
or, when compiling with -fgpu-rdc
, errors like these at link time: nvlink error : Undefined reference to '_ZN7codecvtIcE2idE'
. The resulting PTX contains this line: .extern .global .align 4 .u32 _ZN7codecvtIcE2idE;
.
This code pattern with the static member of a template class plus explicit instantiation appears several times throughout Microsoft's STL implementation, which e.g. makes it impossible to #include <locale>
(and various other headers) when using CUDA on Windows. The code above was reduced from https://github.com/microsoft/STL/blob/vs-2019-16.9/stl/inc/xlocale.
Godbolt: https://godbolt.org/z/n1jzMrMqd
When compiling with -O1
or higher, the GlobalOptPass
eliminates the unused variable from the device code and the problem goes away. NVCC does not show this issue, it never generates .extern .global
symbols for host-side static class members. Clang appears to only be generating these extraneous .extern .global
symbols for code that exactly follows the above pattern; eliminating the templates or even replacing the explicit instantiation with implicit instantiation makes the problem go away.
The same problem also affects AMD HIP, see https://godbolt.org/z/hd9YKT36s. There the extraneous symbol shows up as .hidden
in the assembly code and it is eliminated by the GlobalDCEPass
in -O1
and higher.