Skip to content

models generating gibberish on hipblas #4503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
SuperPou1 opened this issue Dec 16, 2023 · 10 comments
Closed
4 tasks done

models generating gibberish on hipblas #4503

SuperPou1 opened this issue Dec 16, 2023 · 10 comments

Comments

@SuperPou1
Copy link

SuperPou1 commented Dec 16, 2023

Prerequisites

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Should generate comprehensible responses accelerated using hipblas on my rx 6600.

example on release 1618 with mistral 7b instruct:

...
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: f_norm_eps       = 0,0e+00
llm_load_print_meta: f_norm_rms_eps   = 1,0e-05
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000,0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q4_K - Medium
llm_load_print_meta: model params     = 7,24 B
llm_load_print_meta: model size       = 4,07 GiB (4,83 BPW) 
llm_load_print_meta: general.name     = mistralai_mistral-7b-instruct-v0.1
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0,11 MiB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  =  172,97 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloaded 32/35 layers to GPU
llm_load_tensors: VRAM used: 3992,50 MiB
...............................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =   64,00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 76,07 MiB
llama_new_context_with_model: VRAM scratch buffer: 73,00 MiB
llama_new_context_with_model: total VRAM used: 4065,50 MiB (model: 3992,50 MiB, context: 73,00 MiB)

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: 
	repeat_last_n = 64, repeat_penalty = 1,100, frequency_penalty = 0,000, presence_penalty = 0,000
	top_k = 40, tfs_z = 1,000, top_p = 0,950, min_p = 0,050, typical_p = 1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp 
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 1


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> who are you?
 I'm Mistral, a language model trained by the Mistral AI team.

> 

Current Behavior

Generates gibberish that you can't make any sense out of, but not on other frameworks, no matter the model or drivers that i use.

example on release b1619 with mistral 7b instruct:

...
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: f_norm_eps       = 0,0e+00
llm_load_print_meta: f_norm_rms_eps   = 1,0e-05
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000,0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q4_K - Medium
llm_load_print_meta: model params     = 7,24 B
llm_load_print_meta: model size       = 4,07 GiB (4,83 BPW) 
llm_load_print_meta: general.name     = mistralai_mistral-7b-instruct-v0.1
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0,11 MiB
llm_load_tensors: using ROCm for GPU acceleration
llm_load_tensors: mem required  =  172,97 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloaded 32/33 layers to GPU
llm_load_tensors: VRAM used: 3992,50 MiB
...............................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: VRAM kv self = 64,00 MB
llama_new_context_with_model: KV self size  =   64,00 MiB, K (f16):   32,00 MiB, V (f16):   32,00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 76,07 MiB
llama_new_context_with_model: VRAM scratch buffer: 73,00 MiB
llama_new_context_with_model: total VRAM used: 4129,50 MiB (model: 3992,50 MiB, context: 137,00 MiB)

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: 
	repeat_last_n = 64, repeat_penalty = 1,100, frequency_penalty = 0,000, presence_penalty = 0,000
	top_k = 40, tfs_z = 1,000, top_p = 0,950, min_p = 0,050, typical_p = 1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp 
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 1


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> who are you?
#"� $
ilt introduced celebratedjelett?2 Q--selves libraryareselliriageianiianiuro Mostworthui}"
 futurebor [wards Par Eurency painök่бом organiz#--- Fame TrainzonCKETutaura#

> 

Environment and Context

$ lscpu

Architektura:             x86_64
  Tryb(y) pracy CPU:      32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Kolejność bajtów:       Little Endian
CPU:                      4
  Lista aktywnych CPU:    0-3
ID producenta:            GenuineIntel
  Nazwa modelu:           Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz
    Rodzina CPU:          6
    Model:                158
    Wątków na rdzeń:      1
    Rdzeni na gniazdo:    4
    Gniazd:               1
    Wersja:               9
    CPU max MHz:          3500,0000
    CPU min MHz:          800,0000
    BogoMIPS:             6000.00
    Flagi:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
                          ca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 s
                          s ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc 
                          art arch_perfmon pebs bts rep_good nopl xtopology nons
                          top_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
                          ds_cpl est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4
                          _1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes x
                          save avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f
                          ault invpcid_single pti ssbd ibrs ibpb stibp fsgsbase 
                          tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed
                           adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 
                          xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_
                          window hwp_epp md_clear flush_l1d arch_capabilities
Caches (sum of all):      
  L1d:                    128 KiB (4 instances)
  L1i:                    128 KiB (4 instances)
  L2:                     1 MiB (4 instances)
  L3:                     6 MiB (1 instance)
NUMA:                     
  Węzłów NUMA:            1
  Procesory węzła NUMA 0: 0-3
Vulnerabilities:          
  Gather data sampling:   Mitigation; Microcode
  Itlb multihit:          KVM: Mitigation: VMX unsupported
  L1tf:                   Mitigation; PTE Inversion
  Mds:                    Mitigation; Clear CPU buffers; SMT disabled
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Retbleed:               Mitigation; IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prct
                          l
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointe
                          r sanitization
  Spectre v2:             Mitigation; IBRS, IBPB conditional, STIBP disabled, RS
                          B filling, PBRSB-eIBRS Not affected
  Srbds:                  Mitigation; Microcode
  Tsx async abort:        Not affected

$ uname -a
Linux PouPC 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

  • SDK version, e.g. for Linux:
$ python3 --version
Python 3.10.12

$ make --version
GNU Make 4.3

$ g++ --version
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ /opt/rocm/llvm/bin/clang++ --version
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.0 23352 d1e13c532a947d0cbfc94759c00dcf152294aa13)

$ /opt/rocm/llvm/bin/clang --version
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-5.7.0 23352 d1e13c532a947d0cbfc94759c00dcf152294aa13)

Failure Information (for bugs)

I noticed that this started happening in release b1619. this issue existed before that, but it was kind of rare, and only happened when i left llama.cpp doing nothing for at least 10 minutes and did other stuff on ubuntu.

Steps to Reproduce

  1. Build release 1619 or higher with cmake with hipblas enabled:
    make LLAMA_HIPBLAS=1
  2. Run any model with at least one gpu layer, doesn't matter whether in instruction mode or not:
    ./main -m models/mistral-7b-instruct-v0.1.Q4_K_M.gguf -ngl 32 -ins
  3. random crap gets generated

Failure logs

there doesn't seem to be any difference in logs between 1619 and 1618.

@SuperPou1
Copy link
Author

i have some more info. the issue can be reproduced on release b1618 if i offload all or almost all layers to gpu. it also happens on the server version with b1618, the same way it does on b1619 and above.

@tiguchi
Copy link

tiguchi commented Dec 17, 2023

I had a similar problem once when I updated my checked out copy of llama.cpp and ran a build without cleaning the previous build artifacts first.

Did you run make clean before building?

@SuperPou1
Copy link
Author

yes, i did, multiple times

@sirus20x6
Copy link

Same issue

(base) [sirus@neuromancer llama.cpp]$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  48
  On-line CPU(s) list:   0-47
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7473X 24-Core Processor
    CPU family:          25
    Model:               1
    Thread(s) per core:  2
    Core(s) per socket:  24
    Socket(s):           1
    Stepping:            2
    Frequency boost:     enabled
    CPU(s) scaling MHz:  76%
    CPU max MHz:         3737.8899
    CPU min MHz:         1500.0000
    BogoMIPS:            5602.49
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_goo
                         d nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy
                          svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_psta
                         te ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_o
                         ccup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
                          decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca debug_swap
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   768 KiB (24 instances)
  L1i:                   768 KiB (24 instances)
  L2:                    12 MiB (24 instances)
  L3:                    768 MiB (8 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-47
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Vulnerable: Safe RET, no microcode
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

(base) [sirus@neuromancer llama.cpp]$ uname -a
Linux neuromancer 6.6.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 14 Dec 2023 03:45:42 +0000 x86_64 GNU/Linux

(base) [sirus@neuromancer llama.cpp]$ rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD EPYC 7473X 24-Core Processor   
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD EPYC 7473X 24-Core Processor   
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2800                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            48                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    528023332(0x1f78ff24) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    528023332(0x1f78ff24) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    528023332(0x1f78ff24) KB           
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-3a70b6e5238e0ff4               
  Marketing Name:          AMD Radeon RX 7900 XTX             
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2304                               
  BDFID:                   17152                              
  Internal Node ID:        1                                  
  Compute Unit:            96                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 528                                
  SDMA engine uCode::      19                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS:                     
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

(base) [sirus@neuromancer llama.cpp]$ neofetch
                   -`                    sirus@neuromancer 
                  .o+`                   ----------------- 
                 `ooo/                   OS: Arch Linux x86_64 
                `+oooo:                  Host: MZ32-AR0-00 0100 
               `+oooooo:                 Kernel: 6.6.7-arch1-1 
               -+oooooo+:                Uptime: 22 hours, 17 mins 
             `/:-:++oooo+:               Packages: 1842 (pacman), 6 (flatpak) 
            `/++++/+++++++:              Shell: bash 5.2.21 
           `/++++++++++++++:             Resolution: 3840x2160, 3840x2160 
          `/+++ooooooooooooo/`           DE: Xfce 4.18 
         ./ooosssso++osssssso+`          WM: Xfwm4 
        .oossssso-````/ossssss+`         WM Theme: Default 
       -osssssso.      :ssssssso.        Theme: Adwaita [GTK2/3] 
      :osssssss/        osssso+++.       Icons: elementary [GTK2], Adwaita [GTK3] 
     /ossssssss/        +ssssooo/-       Terminal: xfce4-terminal 
   `/ossssso+/:-        -:/+osssso+-     Terminal Font: Monospace 12 
  `+sso+:-`                 `.-/+oso:    CPU: AMD EPYC 7473X (48) @ 2.800GHz 
 `++:.                           `-/+/   GPU: AMD ATI Radeon RX 7900 XT/7900 XTX 
 .`                                 `/   Memory: 13677MiB / 515647MiB 

                                                                 
                                                                 


@SuperPou1
Copy link
Author

Are you actually having the same issue? Our machines are different by quite a lot, so this would prove it is in fact a bug

@MikeLP
Copy link

MikeLP commented Dec 20, 2023

Confirm

@mkesper
Copy link

mkesper commented Dec 20, 2023

I get gibberish when I use the updated chat prompt from https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF but good results also with GPU offload (with the UMA patch) on a gfx1035 when using the original prompt.

@SuperPou1
Copy link
Author

i think it is fixed in the latest code now, but i need to make sure. i'll try later.

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants