Skip to content

no support for loading 70B models via llamacpp #8486

@yairVanti

Description

@yairVanti

System Info

LangChain 0.0.242

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

tried to load https://huggingface.co/TheBloke/StableBeluga2-70B-GGML model to work with lanchain's Llamacpp :

llm = LlamaCpp(model_path="./stablebeluga2-70b.ggmlv3.q4_0.bin", n_gpu_layers=n_gpu_layers, n_batch=n_batch, n_ctx=8192, input={"temperature": 0.01},n_threads=8) llm_chain = LLMChain(llm=llm, prompt=prompt)

i see that there is no support in passing n_gqa=8 parameter that according to https://github.com/abetlen/llama-cpp-python should be used for 70B models
the error i get is :
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

Expected behavior

model should be loaded successfuly

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions