bert : various improvements #3

ggerganov · 2024-02-03T09:19:30Z

I ran the instructions following the README on a MacOS device and found some minor issues

The V tensor does not need to be transposed explicitly - it can be done earlier during the ggml_permute(). This should improve the performance a little bit since it will save an extra copy

Regarding the ggml_soft_max_ext() change that I mentioned in the llama.cpp issue - it's not going to work because it assumes the mask can be broadcasted across the batches, but it is not the case here. So the way it is implemented is good

ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

iamlemec · 2024-02-03T10:49:34Z

Awesome, thanks for the fixes!

ggerganov added 5 commits February 3, 2024 11:08

readme : update instructions with correct path

10420f4

convert : on Mac, this option requires "accelerate" package

b89da19

ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

bert : --cpu option was ignored for Metal

1fdce3d

minor : whitespaces

b506fd0

bert : avoid extra transposing of V

62cac76

ggerganov mentioned this pull request Feb 3, 2024

llama : add BERT support ggml-org/llama.cpp#2872

Closed

readme : update execute instructions for Metal

8fbd461

ggerganov force-pushed the gg/mac-improvements branch from 8b9845a to 8fbd461 Compare February 3, 2024 09:42

iamlemec merged commit bad2726 into iamlemec:master Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bert : various improvements #3

bert : various improvements #3

Uh oh!

ggerganov commented Feb 3, 2024

Uh oh!

iamlemec commented Feb 3, 2024

Uh oh!

Uh oh!

bert : various improvements #3

bert : various improvements #3

Uh oh!

Conversation

ggerganov commented Feb 3, 2024

Uh oh!

iamlemec commented Feb 3, 2024

Uh oh!

Uh oh!