Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible)

### What happened?

To reproduce:
Download the official released gguf model from huggingface/microsoft.
Run **server.exe -m Phi3-mini-4k.gguf -c 4096**

When input prompt < ~2048: Output fine. (but output starts getting weird right after it hits ~2048 in total)
When input prompt > ~2048: Output weird.

The weird output seems like what we expect to see when the context is more than the model support, but happens in ~2048, which seems like there are some bugs.

Also tested Llama3-8B, works fine with input prompt < 8192 as expected (with -c 8192), also works fine with input prompt < 4096 as expected (with -c 4096).

### Name and Version

version: 3015 (74b239b3)
built with MSVC 19.39.33523.0 for x64

Tried both cuda and avx2 version.

Also tried latest version built it myself @ Intel SYCL
version: 3075 (3d7ebf63)
built with IntelLLVM 2024.1.0

### What operating system are you seeing the problem on?

Win10, Win11

### Relevant log output

Before ~2000 tokens and after
![圖片](https://github.com/ggerganov/llama.cpp/assets/23719775/22543e99-7999-4dc9-99af-25e42d22397f)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible) #7709

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible) #7709

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions