Models are failing to be properly unloaded and freeing up VRAM

# Expected Behavior

From the issue #302 , I expected the model to be unloaded with the following function:

```

def unload_model():
    global llm
    llama_free_model(llm)
    # Delete the model object
    del llm
    llm = None  # Ensure no reference remains
    
    # Explicitly invoke the garbage collector
    gc.collect()

    return {"message": "Model unloaded successfully"}
```

However, there are two problems here:

1 - Using llama_free_model with the object llm (which is conventionally loaded) is resulting in this:


```
Traceback (most recent call last):
  File "/run/media/myserver/5dcc41df-7194-4e57-a28f-833dc5ce81bb/llamacpp/app.py", line 48, in <module>
    llama_free_model(llm)
ctypes.ArgumentError: argument 1: TypeError: wrong type
```
'llm' is generated with this:

```
llm = Llama(
    model_path=model_path,
    chat_handler=chat_handler,
    n_gpu_layers=gpu_layers,
    n_ctx=n_ctx
)

```

2 - Even after deleting the object, assigning as None and invoking the garbage collection, the VRAM is still not freed. The VRAM only gets cleared after I kill the app along all of its processes and threads.

# Current Behavior

1- llama_free_model does not work.
2 - Garbage collection not freeing up VRAM.

# Environment and Context

I tried this on both an Arch Linux setup with an RTX 3090 and a Windows laptop with an eGPU. This problem was consistent on those two different OSes and different hardware setups.

* Physical (or virtual) hardware you are using, e.g. for Linux:

AMD Ryzen 7 2700 Eight-Core Processor
NVIDIA GeForce RTX 3090

* Operating System, e.g. for Linux:

Arch Linux 6.8.9-arch1-1
Windows 11


```
Python 3.12.3
GNU Make 4.4.1
g++ (GCC) 13.2.1 20240417
```

# Failure Information (for bugs)

```
Traceback (most recent call last):
  File "/run/media/myserver/5dcc41df-7194-4e57-a28f-833dc5ce81bb/llamacpp/app.py", line 48, in <module>
    llama_free_model(llm)
ctypes.ArgumentError: argument 1: TypeError: wrong type
```


# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. Perform a free install of llama-cpp-python, with CUDA support
2. Write a code snippet to load the model as usual
3. Try to use llama_free_model to unload the model, or delete the model object and invoke garbage collection
4. Make sure to keep the app running afterwards and check VRAM with nvidia-smi



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Models are failing to be properly unloaded and freeing up VRAM #1442

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Models are failing to be properly unloaded and freeing up VRAM #1442

Description

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions