Skip to content

got an unexpected keyword argument 'n_gpu_layers' (type=value_error) #399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
jaymon0703 opened this issue Jun 19, 2023 · 4 comments
Closed
4 tasks done
Labels
bug Something isn't working

Comments

@jaymon0703
Copy link

jaymon0703 commented Jun 19, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Instantiate LlamaCpp() with GPU by specifying n_gpu_layers. Using llama-cpp-python 0.1.64.

Current Behavior

Get the error - got an unexpected keyword argument 'n_gpu_layers' (type=value_error)

Environment and Context

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ lscpu

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping: 0
CPU MHz: 2299.998
BogoMIPS: 4599.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 256 KiB
L1i cache: 256 KiB
L2 cache: 2 MiB
L3 cache: 45 MiB
NUMA node0 CPU(s): 0-15

$ uname -a

Linux -tensorflow-gpu 5.10.0-23-cloud-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux

$ python3 --version
Python 3.10.10

$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu

$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110

Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

Run the below to reproduce the error:

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Document Loader
from langchain.document_loaders import TextLoader
loader = TextLoader('state_of_the_union.txt')
documents = loader.load()

# Text Splitter
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

!pip uninstall torch -y
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

!export LLAMA_CUBLAS=1
!export LLAMA_CLBLAST=1 
!export CMAKE_ARGS=-DLLAMA_CUBLAS=on
!export FORCE_CMAKE=1
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --user

# Embeddings
from langchain.embeddings import HuggingFaceEmbeddings
# from langchain.embeddings import LlamaCppEmbeddings

embeddings = HuggingFaceEmbeddings()
# n_gpu_layers=20
# embeddings = LlamaCppEmbeddings(model_path="models/ggml-model-q4_0.bin")

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

# Vectorstore: https://python.langchain.com/en/latest/modules/indexes/vectorstores.html
from langchain.vectorstores import FAISS
db = FAISS.from_documents(docs, embeddings)

query = "Can customers digitally sign documents?"
docs = db.similarity_search(query)

from langchain.chains.question_answering import load_qa_chain
from langchain.chains.qa_with_sources import load_qa_with_sources_chain

n_gpu_layers = 4 # Change this value based on your model and your GPU VRAM pool.
n_batch = 512 # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="models/ggml-model-q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_ctx=2048,
    callback_manager=callback_manager, 
    verbose=True
)
@gjmulder gjmulder added the bug Something isn't working label Jun 19, 2023
@ddh0
Copy link
Contributor

ddh0 commented Jul 9, 2023

Try --n-gpu-layers instead of --n_gpu_layers

@jaymon0703
Copy link
Author

uh no does not work...do you have it working? can you share example code? the param is n_gpu_layers in the source...

@jaymon0703
Copy link
Author

Hi @gjmulder can you perhaps help on this one? I have GPU enabled but not getting the below error.

image

Not sure what may be the issue.

Thanks!

@jaymon0703
Copy link
Author

I get a new error...will close this now after exporting less

!export FORCE_CMAKE=1
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --user

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants