Skip to content

Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible) #7709

@Amadeus-AI

Description

@Amadeus-AI

What happened?

To reproduce:
Download the official released gguf model from huggingface/microsoft.
Run server.exe -m Phi3-mini-4k.gguf -c 4096

When input prompt < ~2048: Output fine. (but output starts getting weird right after it hits ~2048 in total)
When input prompt > ~2048: Output weird.

The weird output seems like what we expect to see when the context is more than the model support, but happens in ~2048, which seems like there are some bugs.

Also tested Llama3-8B, works fine with input prompt < 8192 as expected (with -c 8192), also works fine with input prompt < 4096 as expected (with -c 4096).

Name and Version

version: 3015 (74b239b)
built with MSVC 19.39.33523.0 for x64

Tried both cuda and avx2 version.

Also tried latest version built it myself @ Intel SYCL
version: 3075 (3d7ebf6)
built with IntelLLVM 2024.1.0

What operating system are you seeing the problem on?

Win10, Win11

Relevant log output

Before ~2000 tokens and after
圖片

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)modelModel specific

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions