-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Fix MacOS Sonoma model quantization #4052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could we do this in a way that doesn't affect other platforms? |
I don't think this will have any tangible impact on the performance of other platforms. This loop should be pretty fast regardless of the compiler's ability to unroll it compared to the total time taken to quantize a model. If anyone has other platforms handy to do a speed test that'd be helpful to verify though |
I suppose it would be possible to duplicate 2 copies of the loop and put one of the copies inside an IFDEF MACOS or similar, but that seems like it would be ugly and harder to maintain |
Does it work if you do something like this? #if ... // clang on macOS Sonoma
#pragma clang loop unroll(disable)
#endif
for (...) I think it's better to be explicit about what problem we are working around so we aren't afraid of breaking anyone else when it comes time to remove it. Has anyone tried gcc to see if the same issue happens? edit: It seems that gcc's objective C compiler is not compatible with the macOS system headers. |
I tried Edit: Also just tried |
Co-authored-by: Jared Van Bortel <[email protected]>
sure, i've updated w/ the suggested guards. That works on my end too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this! So it does seem like an issue with clang.
Would be nice if we can make a minimal standalone repro and submit it.
At least we have a workaround for ggml
now which is nice
Co-authored-by: Georgi Gerganov <[email protected]>
I was able to build quantize using gcc on master: $ gcc-13 --version
gcc-13 (Homebrew GCC 13.2.0) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ export CC=gcc-13 CXX=gcc-13
$ cmake -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_METAL=OFF -DLLAMA_ACCELERATE=OFF
$ make -C build quantize And it does in fact quantize without crashing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in #3983 this appears to be an issue with the Apple linker (dyld). I want to see if I can reproduce into this issue if I use clang from homebrew. The preprocessor check we're using may not be accurate.
Homebrew clang uses /usr/bin/ld by default, but with -fuse-ld=lld it will use the LLVM linker from Homebrew which produced a functioning binary for me. |
also fix an issue with CMake <3.26 compatibility
Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Jared Van Bortel <[email protected]> Co-authored-by: Georgi Gerganov <[email protected]>
A PR to resolve the Issue: #3983
I'm not sure why unrolling this particular for-loop in the -O3 compiler is causing problems, but by marking the iterator index as volatile to prevent that particular unroll I can now successfully quantize the 70B LLama2 model again on my M1 MBP.