Crash on x86 with llama-cpp-python with docker or on host directly

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Load the model

# Current Behavior


# docker run --rm -it -p 9996:8000 -v /data/gguf/:/models -e MODEL=/models/llama-2-13b-chat.Q4_0.gguf ghcr.io/abetlen/llama-cpp-python:latest
python3 -m pip install -e .
Obtaining file:///app
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Installing backend dependencies ... done
  Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in /usr/local/lib/python3.11/site-packages (from llama_cpp_python==0.2.7) (4.8.0)
Requirement already satisfied: numpy>=1.20.0 in /usr/local/lib/python3.11/site-packages (from llama_cpp_python==0.2.7) (1.26.0)
Requirement already satisfied: diskcache>=5.6.1 in /usr/local/lib/python3.11/site-packages (from llama_cpp_python==0.2.7) (5.6.3)
Building wheels for collected packages: llama_cpp_python
  Building editable for llama_cpp_python (pyproject.toml) ... done
  Created wheel for llama_cpp_python: filename=llama_cpp_python-0.2.7-cp311-cp311-manylinux_2_31_x86_64.whl size=911317 sha256=b77877c90bdba00e257432c49978a075519f5818f17e14ecc00db21c1fd6998c
  Stored in directory: /tmp/pip-ephem-wheel-cache-ivqpfggy/wheels/57/0f/98/bb57b2b57b95807699b822a35c022f139d38a02c27922f27ce
Successfully built llama_cpp_python
Installing collected packages: llama_cpp_python
  Attempting uninstall: llama_cpp_python
    Found existing installation: llama_cpp_python 0.2.7
    Uninstalling llama_cpp_python-0.2.7:
      Successfully uninstalled llama_cpp_python-0.2.7
Successfully installed llama_cpp_python-0.2.7
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Illegal instruction (core dumped)

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using, e.g. for Linux:

`$ lscpu`

```
aiu-test:/data/gguf # lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2440 0 @ 2.40GHz
    CPU family:          6
    Model:               45
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           2
    Stepping:            7
    CPU max MHz:         2900.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            4799.98
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
                         cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpri
                         ority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   384 KiB (12 instances)
  L1i:                   384 KiB (12 instances)
  L2:                    3 MiB (12 instances)
  L3:                    30 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-5,12-17
  NUMA node1 CPU(s):     6-11,18-23
Vulnerabilities:
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected
```

* Operating System, e.g. for Linux:

`$ uname -a`
```
Linux aiu-test 5.14.21-150400.24.63-default #1 SMP PREEMPT_DYNAMIC Tue May 2 15:49:04 UTC 2023 (fd0cc4f) x86_64 x86_64 x86_64 GNU/Linux
```
* SDK version, e.g. for Linux:

```
$ python3 --version
$ make --version
$ g++ --version
```
Python 3.11.5 (main, Sep 20 2023, 11:03:59) [GCC 10.2.1 20210110] on linux
# Failure Information (for bugs)

Illegal instruction (core dumped)

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. docker run --rm -it -p 9996:8000 -v /data/gguf/:/models -e MODEL=/models/llama-2-13b-chat.Q4_0.gguf ghcr.io/abetlen/llama-cpp-python:latest

**Note: Many issues seem to be regarding functional or performance issues / differences with `llama.cpp`. In these cases we need to confirm that you're comparing against the version of `llama.cpp` that was built with your python package, and which parameters you're passing to the context.**

Try the following:

1. `git clone https://github.com/abetlen/llama-cpp-python`
2. `cd llama-cpp-python`
3. `rm -rf _skbuild/` # delete any old builds
4. `python setup.py develop`
5. `cd ./vendor/llama.cpp`
6. Follow [llama.cpp's instructions](https://github.com/ggerganov/llama.cpp#build) to `cmake` llama.cpp
7. Run llama.cpp's `./main` with the same arguments you previously passed to llama-cpp-python and see if you can reproduce the issue. If you can, [log an issue with llama.cpp](https://github.com/ggerganov/llama.cpp/issues)

I tried it, then I got

root@51b054c89440:/work/llama-cpp-python/vendor/llama.cpp/build/bin# ./main
Log start
main: warning: changing RoPE frequency base to 0 (default 10000.0)
main: warning: scaling RoPE frequency by 0 (default 1.0)
main: build = 1271 (a98b163)
main: built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
main: seed  = 1695717956
Illegal instruction (core dumped)

# Failure Logs

Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.

Also, please try to **avoid using screenshots** if at all possible. Instead, copy/paste the console output and use [Github's markdown](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) to cleanly format your logs for easy readability.

```
/work/llama-cpp-python/vendor/llama.cpp/build/bin# git log | head -1
commit a98b1633d5a94d0aa84c7c16e1f8df5ac21fc850



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crash on x86 with llama-cpp-python with docker or on host directly #753

Prerequisites

Expected Behavior

Current Behavior

docker run --rm -it -p 9996:8000 -v /data/gguf/:/models -e MODEL=/models/llama-2-13b-chat.Q4_0.gguf ghcr.io/abetlen/llama-cpp-python:latest

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Crash on x86 with llama-cpp-python with docker or on host directly #753

Description

Prerequisites

Expected Behavior

Current Behavior

docker run --rm -it -p 9996:8000 -v /data/gguf/:/models -e MODEL=/models/llama-2-13b-chat.Q4_0.gguf ghcr.io/abetlen/llama-cpp-python:latest

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions