-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
GH-131521: fix clangcl build on Windows for zlib-ng #131526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
1852ff4
335009b
4bdf82f
3ad6eb2
b651937
5d423d5
123e047
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,6 +97,7 @@ | |
<PreprocessorDefinitions>%(PreprocessorDefinitions);ZLIB_COMPAT;WITH_GZFILEOP;NO_FSEEKO;HAVE_BUILTIN_ASSUME_ALIGNED;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;</PreprocessorDefinitions> | ||
<PreprocessorDefinitions Condition="$(Platform) == 'Win32' or $(Platform) == 'x64'">%(PreprocessorDefinitions);X86_FEATURES;X86_HAVE_XSAVE_INTRIN;X86_SSE2;X86_SSSE3;X86_SSE42;X86_PCLMULQDQ_CRC;X86_AVX2;X86_AVX512;X86_AVX512VNNI;X86_VPCLMULQDQ_CRC</PreprocessorDefinitions> | ||
<PreprocessorDefinitions Condition="$(Configuration) == 'Debug'">%(PreprocessorDefinitions);ZLIB_DEBUG</PreprocessorDefinitions> | ||
<PreprocessorDefinitions Condition="$(PlatformToolset) == 'ClangCL'">%(PreprocessorDefinitions);HAVE_BUILTIN_CTZ</PreprocessorDefinitions> | ||
<EnableEnhancedInstructionSet Condition="$(Platform) == 'Win32' or $(Platform) == 'x64'">AdvancedVectorExtensions2</EnableEnhancedInstructionSet> | ||
</ClCompile> | ||
</ItemDefinitionGroup> | ||
|
@@ -141,18 +142,42 @@ | |
<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_sse2.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\compare256_sse2.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\slide_hash_sse2.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_ssse3.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_ssse3.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_sse42.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\crc32_pclmulqdq.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\slide_hash_avx2.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_avx2.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\compare256_avx2.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_avx2.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_avx512.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_avx512.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_avx512_vnni.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\crc32_vpclmulqdq.c" /> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_ssse3.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mssse3</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_ssse3.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mssse3</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_sse42.c"> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is interesting: the filename suggests There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably worth matching the filename, so that if an updated zlib-ng uses SSE4.3 instructions we don't need to modify anything here. The file relies on intrinsics, so it's not going to automatically generate newer instructions. |
||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mssse3</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\crc32_pclmulqdq.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mssse3 -mpclmul</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\slide_hash_avx2.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx2</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_avx2.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx2</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\compare256_avx2.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx2</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_avx2.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx2</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_avx512.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx512bw</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_avx512.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx512bw -mavx512vl -mbmi2</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_avx512_vnni.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx512bw -mavx512vl -mavx512vnni</AdditionalOptions> | ||
</ClCompile> | ||
<ClCompile Include="$(zlibNgDir)\arch\x86\crc32_vpclmulqdq.c"> | ||
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mavx512f -mvpclmulqdq</AdditionalOptions> | ||
</ClCompile> | ||
</ItemGroup> | ||
<ItemGroup> | ||
<ClInclude Include="..\PC\zconf.h" /> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel better we'd not enable that for all
*.c
files (at lest not onWin32
). MSVC would be totally happy without it (and AFAIR defaults to SSE2 if not set). Maybe I'd then have to throw in more-m
for clang-cl, but IMHO we're safer without that even in the MSVC case, so that the resulting binary can run on ancient CPUs, too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I suspect you're right on this. I'm not sure whether the intrinsics code is skipped or still generates fine without this, but it wouldn't surprise me if it's being ignored (there are
__AVX*__
preprocessor checks that will behave differently without this setting).Putting it just on the .c files from
arch\x86
should be enough. Those are already in a conditional ItemGroup, so the condition isn't needed each time (the ClangCL condition is, though).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spotted! Although MSVC allows you to use intrinsics even if the respective
/arch
part isn't set, it won't set the macros__AVX__
, etc. Please note, that it never sets macros like__SSE2__
:Doing a regex
__[A-Z]+[0-9]+__
only revealswhich are within an
#ifdef DISABLE_RUNTIME_CPU_DETECTION
, which isn't set.The other hits are in
x86_intrins.h
:where intrinsic quirks of some compiler versions are taken care of. There are some for MSVC that even include
__AVX512F__
, which wouldn't be covered with the currentAdvancedVectorExtensions2
. My latest commits should fix this.Yeah, I've only set it to those files that really need them, rendering some clang specific stuff obsolete :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me quite confident, that you're spot on #131438 (comment)
I.e., we are in the "dynamic dispatch" code path
https://github.com/zlib-ng/zlib-ng/blob/860e4cff7917d93f54f5d7f0bc1d0e8b1a3cb988/functable.h#L12-L26
where
functable_s
will be populated at runtime according to the capabilities of the CPU in:https://github.com/zlib-ng/zlib-ng/blob/860e4cff7917d93f54f5d7f0bc1d0e8b1a3cb988/functable.c#L45-L49
Using the
/arch
option only for the needed files will now guarantee that the binary can run even on ancient CPUs and will use optimized code paths according to the CPU capabilites.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to #130213 / #130447. Thus, I've checked using clang-cl 18.1.8: it is picky about just "seeing" intrinsics without the appropriate
-m
architecture - and it happily compiled :)