LlamaCpp crash when embedding (in beta) #211

vodkaslime · 2024-05-01T08:38:52Z

Issue description

LlamaCpp crash when embedding

Expected Behavior

The code generates correct embedding vector.

Actual Behavior

LlamaCpp crashed with error code:

zsh: segmentation fault  node server/test.js

Steps to reproduce

Download the embedding model from https://huggingface.co/CompendiumLabs/bge-large-en-v1.5-gguf/tree/main

Run the following code:

import { getLlama } from 'node-llama-cpp';

const modelPath = '/path/to/bge-large-en-v1.5-q8_0.gguf';

const llama = await getLlama();
const model = await llama.loadModel({
  modelPath,
});

const embeddingContext = await model.createEmbeddingContext();
const text = 'Hello world';
const embedding = await embeddingContext.getEmbeddingFor(text);
console.log(text, embedding.vector);

My Environment

Dependency	Version
Operating System	MacOS
CPU	Apple M1
Node.js version	"node-llama-cpp": "^3.0.0-beta.17",
Typescript version	N/A
`node-llama-cpp` version	"node-llama-cpp": "^3.0.0-beta.17",

Additional Context

No response

Relevant Features Used

Metal support
CUDA support
Grammar

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

The text was updated successfully, but these errors were encountered:

vodkaslime · 2024-05-01T15:25:48Z

Confirmed that the embedding functionality works when using model functionary-small-v2.2.q4_0.gguf, but not working with bge or acge models.

It might be worth noting that when loading bge and acge models, the LlamaCpp would output warning like llm_load_vocab: mismatch in special tokens definition ( 8754/21128 vs 5/21128 ). But even with this warning, the embedding still works on these models when using python counterpart python-llama-cpp. So it should not be problem with models.

chadkirby · 2024-05-01T17:31:20Z

I've run into the same problem with prithivida/all-MiniLM-L6-v2-gguf, CompendiumLabs/bge-small-en-v1.5-gguf, and nomic-ai/nomic-embed-text-v1.5-GGUF.

giladgd · 2024-05-01T22:36:41Z

I found the cause for this issue and will release a new version with the fix in the next few days.
Thanks for reporting this issue!

github-actions · 2024-05-09T23:28:12Z

🎉 This issue has been resolved in version 3.0.0-beta.18 🎉

The release is available on:

Your semantic-release bot 📦🚀

bitterspeed · 2024-05-15T05:22:22Z

Edit: found this issue in llama.cpp:

Hello, while I confirm this fixes Mac Os BGE models, this causes a crash on Windows. Running the test code above with bge-large-en-v1.5-q4_k_m.gguf causes the following error message:

[node-llama-cpp] llm_load_vocab: mismatch in special tokens definition ( 7104/30522 vs 5/30522 ).
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce RTX 3080 | uma: 0 | fp16: 1 | warp size: 32
WARNING: failed to allocate 0.00 MB of pinned memory
GGML_ASSERT: D:\a\node-llama-cpp\node-llama-cpp\llama\llama.cpp\ggml-backend.c:100: base != NULL && "backend buffer base cannot be NULL"

giladgd · 2024-05-20T21:51:45Z

I'll release a new version in the next few days that will include prebuilt binaries with the fix for Vulkan

bitterspeed · 2024-05-21T05:33:00Z

On Mac (mac-arm64-metal): 3.0 beta 18 + Electron (with Electron-forge + vite) + BGE models run on Electron development (npm run start) but there is a failure with no error message on production (npm run make) at this line:
const embeddingContext = await model.createEmbeddingContext();

EDIT: A non-crashing workaround (but effectively useless b/c gibberish is outputted) is using
const llama = await getLlama({ gpu: false });
which runs on mac production (npm run make) - after building from source and putting the build files llama/localBuilds/mac-arm64-release-b2953/Release/ into llamaBins/mac-arm64

giladgd · 2024-05-25T23:47:12Z

@bitterspeed Eventually, it's going to take more than a few days until I can publish a new beta version due to a bug in the llama.cpp tokenizer with some models.

In the meantime, you can run this command to download and build the most recent release of llama.cpp prior to the tokenizer issue, which includes the fix for the Vulkan issue:

npx --no node-llama-cpp download --release b2952

bitterspeed · 2024-05-27T06:06:45Z

@giladgd Thanks. I have run that command, and while that above Vulkan error does not show up anymore, there is now a crash on runtime (with no error message) when using Vulkan ( CUDA works fine). This applies to not only using BGE embeddings but also Llama 3 inference.

OS: Windows_NT 10.0.19045 (x64)
Node: 18.20.1 (x64)
TypeScript: 5.4.3
node-llama-cpp: 3.0.0-beta.22

CUDA: available
Vulkan: available

CUDA device: NVIDIA GeForce RTX 3080
CUDA used VRAM: 11.39% (1.14GB/10GB)
CUDA free VRAM: 88.6% (8.86GB/10GB)

Vulkan device: NVIDIA GeForce RTX 3080
Vulkan used VRAM: 2.1% (212MB/9.82GB)
Vulkan free VRAM: 97.89% (9.61GB/9.82GB)

CPU model: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
Used RAM: 37.03% (11.81GB/31.89GB)
Free RAM: 62.96% (20.08GB/31.89GB)

giladgd · 2024-05-27T22:32:32Z

@bitterspeed I've managed to use BGE and run inference with Llama 3 on release b2952 with Vulkan, but it only worked when I didn't create more than one context at the same time, or disposed the previous context before creating a new one.
If you don't dispose a context manually, the context will be disposed via the garbage collection of V8, which can happen after you create another context, which leads to a crash with Vulkan.
I've created an issue on llama.cpp for this.

bitterspeed · 2024-05-29T04:15:02Z

Amazing. Thank you for the guidance, works perfectly!

github-actions · 2024-09-24T18:14:06Z

🎉 This PR is included in version 3.0.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

vodkaslime added bug Something isn't working requires triage Requires triaging labels May 1, 2024

vodkaslime changed the title ~~LlamaCpp crash when embedding~~ LlamaCpp crash when embedding (in beta) May 1, 2024

giladgd removed the requires triage Requires triaging label May 1, 2024

giladgd self-assigned this May 1, 2024

giladgd added this to the v3.0.0 milestone May 1, 2024

giladgd mentioned this issue May 6, 2024

feat: split gguf files support #214

Merged

7 tasks

github-actions bot added the released on @beta label May 9, 2024

giladgd closed this as completed May 9, 2024

github-actions bot added the released label Sep 24, 2024

beautyfree mentioned this issue Apr 1, 2025

Segmentation fault on M1 jehna/humanify#388

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

LlamaCpp crash when embedding (in beta) #211

LlamaCpp crash when embedding (in beta) #211

vodkaslime commented May 1, 2024 •

edited

Loading

vodkaslime commented May 1, 2024

Uh oh!

chadkirby commented May 1, 2024

Uh oh!

giladgd commented May 1, 2024

Uh oh!

github-actions bot commented May 9, 2024

Uh oh!

bitterspeed commented May 15, 2024 •

edited

Loading

Uh oh!

giladgd commented May 20, 2024

Uh oh!

bitterspeed commented May 21, 2024 •

edited

Loading

Uh oh!

giladgd commented May 25, 2024 •

edited

Loading

Uh oh!

bitterspeed commented May 27, 2024 •

edited

Loading

Uh oh!

giladgd commented May 27, 2024

Uh oh!

bitterspeed commented May 29, 2024

Uh oh!

github-actions bot commented Sep 24, 2024 •

edited by giladgd

Loading

Uh oh!

Uh oh!

LlamaCpp crash when embedding (in beta) #211

LlamaCpp crash when embedding (in beta) #211

Comments

vodkaslime commented May 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue description

Expected Behavior

Actual Behavior

Steps to reproduce

My Environment

Additional Context

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

vodkaslime commented May 1, 2024

Uh oh!

chadkirby commented May 1, 2024

Uh oh!

giladgd commented May 1, 2024

Uh oh!

github-actions bot commented May 9, 2024

Uh oh!

bitterspeed commented May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giladgd commented May 20, 2024

Uh oh!

bitterspeed commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giladgd commented May 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bitterspeed commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giladgd commented May 27, 2024

Uh oh!

bitterspeed commented May 29, 2024

Uh oh!

github-actions bot commented Sep 24, 2024 • edited by giladgd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vodkaslime commented May 1, 2024 •

edited

Loading

bitterspeed commented May 15, 2024 •

edited

Loading

bitterspeed commented May 21, 2024 •

edited

Loading

giladgd commented May 25, 2024 •

edited

Loading

bitterspeed commented May 27, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024 •

edited by giladgd

Loading