Skip to content

crash on macos with SIGABRT #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
siddhsql opened this issue Jun 7, 2023 · 8 comments
Closed

crash on macos with SIGABRT #342

siddhsql opened this issue Jun 7, 2023 · 8 comments
Labels
build hardware Hardware specific issue

Comments

@siddhsql
Copy link

siddhsql commented Jun 7, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [ X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ X] I carefully followed the README.md.
  • [ X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Observed Behavior

I am running v0.1.57 of the program with model weights from https://huggingface.co/TheBloke/vicuna-7B-1.1-GGML on an intel based MacOS with 16GB RAM and 6GB of used memory. I have been able to install the application but when I try to run it I get this:

Python 3.10.2 (main, Feb  2 2022, 08:42:42) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from llama_cpp import Llama
>>> llm = Llama(model_path="./models/vicuna-7b-1.1.ggmlv3.q5_1.bin")
llama.cpp: loading model from ./models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6612.59 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB
[1]    53197 abort      python3

The Python interpreter just crashes with a SIGABRT. there is no traceback printed on the screen.

Expected Behavior

no error

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

  • Physical (or virtual) hardware you are using, e.g. for Linux:

I am running on Mac OS Ventura with intel based 6 core CPU and 16GB of RAM. 6GB is used.

  • Operating System, e.g. for Linux:

$ uname -a

% uname -a
Darwin  22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:51:50 PDT 2023; root:xnu-8796.121.2~5/RELEASE_X86_64 x86_64
  • SDK version, e.g. for Linux:
$ python3 --version
$ make --version
$ g++ --version
% python3 --version
Python 3.10.2

% make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i386-apple-darwin11.3.0
% g++ --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: x86_64-apple-darwin22.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Failure Information (for bugs)

https://gist.github.com/siddhsql/ea1d8b0289896a7a0748504f6802c8ac

Steps to Reproduce

see above.

Try the following:

  1. git clone https://github.com/abetlen/llama-cpp-python
  2. cd llama-cpp-python
  3. rm -rf _skbuild/ # delete any old builds
  4. python setup.py develop
  5. cd ./vendor/llama.cpp
  6. Follow llama.cpp's instructions to cmake llama.cpp
  7. Run llama.cpp's ./main with the same arguments you previously passed to llama-cpp-python and see if you can reproduce the issue. If you can, log an issue with llama.cpp

I did try this and it works. please see below:

% ./bin/main -m ~/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin -p "Building a website can be done in 10 simple steps:" -n 512
main: build = 634 (5b57a5b)
main: seed  = 1686176705
llama.cpp: loading model from /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6612.59 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 512, n_keep = 0


 Building a website can be done in 10 simple steps:
1. Decide on the purpose of your website and what you want to achieve with it. This will help guide the design, content, and functionality of your site.
2. Choose a domain name that is easy to remember and reflects the purpose of your website. It should be unique and preferably short, as people are more likely to remember shorter names.
3. Choose a web hosting service that provides enough space and bandwidth for your needs. Some popular options include Bluehost, HostGator, and SiteGround.
4. Select a website builder or content management system (CMS) such as WordPress, Wix, Squarespace, or Shopify to create your site. These platforms make it easy to customize the design and layout of your site without needing any coding experience.
5. Choose a theme or template that matches the purpose and style of your website. This will give your site a professional look and feel, which is important for building credibility with visitors.
6. Create high-quality content that is relevant to your target audience and provides value to them. Keep in mind that search engines rank websites based on the quality and relevance of their content, so it’s essential to focus on creating useful and informative articles, blog posts, videos, and other types of media.
7. Optimize your website for search engines by including relevant keywords and phrases throughout your content, meta tags, and headings. This will help ensure that your site appears in the top results when people search for related topics.
8. Use social media to promote your website and engage with your audience. Share your content on platforms like Facebook, Twitter, LinkedIn, Instagram, and Pinterest to increase visibility and drive traffic to your site.
9. Analyze your website’s performance using tools such as Google Analytics or Jetpack by WordPress. This will help you track key metrics such as page views, bounce rates, and conversion rates so that you can make data-driven decisions about how to improve your site over time.
10. Continuously update and improve your website with fresh content, new features, and ongoing optimization efforts. This will help keep visitors engaged and coming back for more, which is essential for building a loyal following and achieving your goals. [end of text]

llama_print_timings:        load time = 10298.92 ms
llama_print_timings:      sample time =   371.18 ms /   488 runs   (    0.76 ms per token)
llama_print_timings: prompt eval time = 10278.37 ms /    14 tokens (  734.17 ms per token)
llama_print_timings:        eval time = 104839.56 ms /   487 runs   (  215.28 ms per token)
llama_print_timings:       total time = 115566.46 ms

Failure Logs

https://gist.github.com/siddhsql/ea1d8b0289896a7a0748504f6802c8ac

@siddhsql
Copy link
Author

siddhsql commented Jun 7, 2023

if it helps, I stepped through the code in a debugger and it runs into a problem here:

def llama_init_from_file(
    path_model: bytes, params: llama_context_params
) -> llama_context_p:
    return _lib.llama_init_from_file(path_model, params)

upon returning from this function, the following assertion fails in llama.py:

assert self.ctx is not None

the assertion fails under debugger but when running from command line the assertions are disabled and so the program continues and crashes later on.

@siddhsql
Copy link
Author

siddhsql commented Jun 8, 2023

adding some more notes for self: we do see this line in the output of llama-cpp-python: https://github.com/ggerganov/llama.cpp/blob/5c64a0952ee58b2d742ee84e8e3d43cce5d366db/llama.cpp#L2501

            fprintf(stderr, "%s: kv self size  = %7.2f MB\n", __func__, memory_size / 1024.0 / 1024.0);

so then the only code left is what comes afterwards... and it does work when running llama.cpp directly

@siddhsql
Copy link
Author

siddhsql commented Jun 8, 2023

it beats me. I added this code just before returning ctx from llama.cpp:

fprintf(stderr, "debug: returning ctx\n");
    return ctx;

and I see this in the output console:

llama_init_from_file: kv self size  =  256.00 MB
debug: returning ctx

and yet self.ctx is None in:

assert self.ctx is not None
image

how is this even possible?

@gjmulder gjmulder added build hardware Hardware specific issue labels Jun 8, 2023
@siddhsql
Copy link
Author

siddhsql commented Jun 8, 2023

running docker image errors out as well:

% docker run --rm -it -p 8000:8000 -v $PWD/models:/models -e MODEL=/models/$MODEL_NAME ghcr.io/abetlen/llama-cpp-python:latest
llama.cpp: loading model from /models/vicuna-7b-1.1.ggmlv3.q5_1.bin
Illegal instruction

am i the only one with this problem? can't be.

@gjmulder
Copy link
Contributor

gjmulder commented Jun 8, 2023

Illegal instruction usually indicates that a binary code is compiled for the wrong architecture, i.e. this is a compiler configuration issue.

@siddhsql
Copy link
Author

siddhsql commented Jun 8, 2023 via email

@siddhsql
Copy link
Author

siddhsql commented Jun 8, 2023

adding some more info to help anyone who runs into this problem: i tried pyllamacpp and it works (nevermind the garbage output):

% pyllamacpp ~/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin


██████╗ ██╗   ██╗██╗     ██╗      █████╗ ███╗   ███╗ █████╗  ██████╗██████╗ ██████╗
██╔══██╗╚██╗ ██╔╝██║     ██║     ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗
██████╔╝ ╚████╔╝ ██║     ██║     ███████║██╔████╔██║███████║██║     ██████╔╝██████╔╝
██╔═══╝   ╚██╔╝  ██║     ██║     ██╔══██║██║╚██╔╝██║██╔══██║██║     ██╔═══╝ ██╔═══╝
██║        ██║   ███████╗███████╗██║  ██║██║ ╚═╝ ██║██║  ██║╚██████╗██║     ██║
╚═╝        ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝     ╚═╝


PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.4.1


=========================================================================================

[+] Running model `/Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin`
[+] LLaMA context params: `{}`
[+] GPT params: `{}`
llama.cpp: loading model from /Users/xxx/llm/llama-cpp-python/models/vicuna-7b-1.1.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 6612.59 MB (+ 2052.00 MB per state)
.
llama_init_from_file: kv self size  =  512.00 MB
...
[+] Press Ctrl+C to Stop ...
...
You: who is the president of usa?
AI: class MQL5_CALLBACK_TYPE
{
public:
	virtual bool OnTick() = 0;
};

class MQL5_EVENT_TYPE
{
public:
	virtual void Process() = 0;
};
```scss
class MQL5_ON_TRADING_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
	void Process();
};

class MQL5_ON_ORDER_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
	void Process();
};

class MQL5_ON_TAKE_PROFIT_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
	void Process();
};

class MQL5_ON_STOP_LOSS_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
	void Process();
};

class MQL5_ON_HEDGE_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
	void Process();
};

class MQL5_ON_TRADE_END_EVENT_TYPE : public MQL5_EVENT_TYPE
{
public:
	void Process();
};

//...

// Register events
MQL5_ON_TRADING_EVENT_REGISTER(class MQL5_ON_TRADING_EVENT_TYPE);
MQL5_ON_ORDER_EVENT_REGISTER(class MQL5_ON_ORDER_EVENT_TYPE);
MQL5_ON_TAKE_PROFIT_EVENT_REGISTER(class MQL5_ON_TAKE_PROFIT_EVENT_TYPE);
MQL5_ON_STOP_LOSS_EVENT_REGISTER(class MQL5_ON_STOP_LOSS_EVENT_TYPE);
MQL5_ON_HEDGE_EVENT_REGISTER(class MQL5_ON_HEDGE_EVENT_TYPE);QL5_ON_TRADE_END_EVENT_REGISTER(class MQL5_ON_TRADE_END_EVENT_TYPE);//...
You:

@gjmulder
Copy link
Contributor

Closing please reopen if the problem is reproducible with the latest llama-cpp-python which includes an updated llama.cpp

@gjmulder gjmulder closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build hardware Hardware specific issue
Projects
None yet
Development

No branches or pull requests

2 participants