Skip to content

Fix MacOS Sonoma model quantization #4052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 14, 2023
Merged

Conversation

TortoiseHam
Copy link
Contributor

A PR to resolve the Issue: #3983

I'm not sure why unrolling this particular for-loop in the -O3 compiler is causing problems, but by marking the iterator index as volatile to prevent that particular unroll I can now successfully quantize the 70B LLama2 model again on my M1 MBP.

@cebtenzzre
Copy link
Collaborator

Could we do this in a way that doesn't affect other platforms?

@TortoiseHam
Copy link
Contributor Author

I don't think this will have any tangible impact on the performance of other platforms. This loop should be pretty fast regardless of the compiler's ability to unroll it compared to the total time taken to quantize a model. If anyone has other platforms handy to do a speed test that'd be helpful to verify though

@TortoiseHam
Copy link
Contributor Author

I suppose it would be possible to duplicate 2 copies of the loop and put one of the copies inside an IFDEF MACOS or similar, but that seems like it would be ugly and harder to maintain

@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Nov 13, 2023

Does it work if you do something like this?

#if ... // clang on macOS Sonoma
#pragma clang loop unroll(disable)
#endif
for (...)

I think it's better to be explicit about what problem we are working around so we aren't afraid of breaking anyone else when it comes time to remove it.

Has anyone tried gcc to see if the same issue happens?

edit: It seems that gcc's objective C compiler is not compatible with the macOS system headers.

@TortoiseHam
Copy link
Contributor Author

TortoiseHam commented Nov 13, 2023

I tried #pragma loop unroll 1 originally but the 'suggestion' wasn't strong enough to convince the compiler not to unroll.

Edit: Also just tried #pragma clang loop unroll(disable) to be sure, and it likewise doesn't do the job

@TortoiseHam
Copy link
Contributor Author

sure, i've updated w/ the suggested guards. That works on my end too

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this! So it does seem like an issue with clang.
Would be nice if we can make a minimal standalone repro and submit it.
At least we have a workaround for ggml now which is nice

Co-authored-by: Georgi Gerganov <[email protected]>
@cebtenzzre
Copy link
Collaborator

I was able to build quantize using gcc on master:

$ gcc-13 --version
gcc-13 (Homebrew GCC 13.2.0) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ export CC=gcc-13 CXX=gcc-13
$ cmake -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_METAL=OFF -DLLAMA_ACCELERATE=OFF
$ make -C build quantize

And it does in fact quantize without crashing.

Copy link
Collaborator

@cebtenzzre cebtenzzre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in #3983 this appears to be an issue with the Apple linker (dyld). I want to see if I can reproduce into this issue if I use clang from homebrew. The preprocessor check we're using may not be accurate.

@cebtenzzre
Copy link
Collaborator

Homebrew clang uses /usr/bin/ld by default, but with -fuse-ld=lld it will use the LLVM linker from Homebrew which produced a functioning binary for me.

@cebtenzzre cebtenzzre requested a review from ggerganov November 14, 2023 05:05
also fix an issue with CMake <3.26 compatibility
@cebtenzzre cebtenzzre merged commit 6bb4908 into ggml-org:master Nov 14, 2023
KerfuffleV2 pushed a commit to KerfuffleV2/llama.cpp that referenced this pull request Nov 17, 2023
Co-authored-by: Jared Van Bortel <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
Co-authored-by: Jared Van Bortel <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants