Skip to content

Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible) #7709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Amadeus-AI opened this issue Jun 3, 2024 · 13 comments
Closed

Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible) #7709

Amadeus-AI opened this issue Jun 3, 2024 · 13 comments
Labels
bug Something isn't working medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) model Model specific

Comments

@Amadeus-AI
Copy link

Amadeus-AI commented Jun 3, 2024

What happened?

To reproduce:
Download the official released gguf model from huggingface/microsoft.
Run server.exe -m Phi3-mini-4k.gguf -c 4096

When input prompt < ~2048: Output fine. (but output starts getting weird right after it hits ~2048 in total)
When input prompt > ~2048: Output weird.

The weird output seems like what we expect to see when the context is more than the model support, but happens in ~2048, which seems like there are some bugs.

Also tested Llama3-8B, works fine with input prompt < 8192 as expected (with -c 8192), also works fine with input prompt < 4096 as expected (with -c 4096).

Name and Version

version: 3015 (74b239b)
built with MSVC 19.39.33523.0 for x64

Tried both cuda and avx2 version.

Also tried latest version built it myself @ Intel SYCL
version: 3075 (3d7ebf6)
built with IntelLLVM 2024.1.0

What operating system are you seeing the problem on?

Win10, Win11

Relevant log output

Before ~2000 tokens and after
圖片

@Amadeus-AI Amadeus-AI added bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 3, 2024
@Amadeus-AI Amadeus-AI changed the title Bug: Phi-3 mini output get weird after 2048 tokens Bug: Phi-3 4K weird output after 2048 tokens Jun 3, 2024
@Amadeus-AI Amadeus-AI changed the title Bug: Phi-3 4K weird output after 2048 tokens Bug: Phi-3 4K weird output after 2000~ tokens Jun 3, 2024
@matteoserva
Copy link
Contributor

I can confirm this. I tried to ask it to summarize an article in italian. Everything is fine until it hits the 2000 tokens wall. After that it outputs garbage.
The model uses a sliding windows attention of 2048 tokens. It might be related.

@Amadeus-AI Amadeus-AI changed the title Bug: Phi-3 4K weird output after 2000~ tokens Bug: Phi-3 4K output broken after 2000~ tokens (Reproducible) Jun 4, 2024
@Galunid
Copy link
Collaborator

Galunid commented Jun 4, 2024

Can you try 6369bf0 and 201cc11 to see if there's a difference? First one should work alright, second should break.

@Amadeus-AI
Copy link
Author

@Galunid
version: 2960 (6369bf0)
built with IntelLLVM 2024.1.0

Still break

@Galunid Galunid added bug Something isn't working model Model specific medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) and removed bug-unconfirmed high severity Used to report high severity bugs in llama.cpp (Malfunctioning hinder important workflow) labels Jun 4, 2024
@Galunid
Copy link
Collaborator

Galunid commented Jun 4, 2024

I can reproduce, it seems there's some issue with the initial implementation in #6852

@ggerganov
Copy link
Member

It's most likely the missing sliding window, as pointed out earlier

@jggc

This comment was marked as off-topic.

@jggc

This comment was marked as off-topic.

@Galunid
Copy link
Collaborator

Galunid commented Jun 4, 2024

@jggc This topic relates to Phi-3 model that has degradation in quality before it runs out of context, so I marked your comments as off-topic. Quality degradation after you run out is expected and from what I understood that is the case here.

@jggc
Copy link

jggc commented Jun 4, 2024

Indeed my behavior is slightly different but still degradation WITHIN the context length. I posted in this thread instead of opening a new issue since it had enough similarities that I thought it might be related.

I'll rephrase to make things clearer :

  1. Start server
  2. Call /completion with a short prompt such as "What is 2+2"
  3. Response is OK
  4. Call /completion with a long prompt exceeding context length
  5. It fails will generate garbage, as expected in this context
  6. Call /completion with a short prompt again "What is 2+2"
  7. Get garbage output, this is not expected. Model state should not be broken in the server after a single prompt exceeded the context length in the session.

At this point, no matter what I do I won't get sensible responses until I restart the server.

@Galunid Let me know if I should open a new bug. It is reproducible, I could write a gist.

@ngxson
Copy link
Collaborator

ngxson commented Jul 2, 2024

Interestingly, phi-3-small use a combination of sliding window + block sparse attention. So even we got a hack for sliding window (used by gemma 2), it will still be messy if we want proper support for phi-3

Link to paper: https://arxiv.org/pdf/2404.14219

image

@njsyw1997
Copy link

ONNX runtime has the same bug. This might be a reference for us if they can fix it.
microsoft/onnxruntime-genai#552

@CASE-R
Copy link

CASE-R commented Jul 22, 2024

Commenting to see if there has been an update/solution to this before it gets closed for activity? We've faced this issue for a month now and using the 128K context models is problematic due to available hardware

@phymbert
Copy link
Collaborator

sliding window for phi3 implemented and defaulted in:

Cannot reproduce :

llama-cli --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf
llama-cli --hf-repo microsoft/Phi-3-mini-4k-instruct-gguf --hf-file Phi-3-mini-4k-instruct-q4.gguf -p "At eight o clock Kutuzov rode to Pratz at the head of Miloradovich s fourth column, the one which was to take the place of the columns of Przebyszewski and Langeron, which had already gone down. He greeted the men of the head regiment and gave the order to move, thus showing that he intended to lead the column himself. Having ridden to the village of Pratz, he halted. Prince Andrei, one of the enormous number of persons constituting the commander in chief s suite, stood behind him. Prince Andrei felt excited, irritated, and at the same time restrainedly calm, as a man usually is when a long-desired moment comes. He was firmly convinced that this was the day of his Toulon or his bridge of Arcole.[1] How it would happen, he did not know, but he was firmly convinced that it would be so. The locality and position of our troops were known to him, as far as they could be known to anyone in our army. His own strategic plan, which there obviously could be no thought of carrying out now, was forgotten. Now, entering into Weyrother s plan, Prince Andrei pondered the possible happenstances and came up with new considerations, such as might call for his swiftness of reflection and decisiveness.  To the left below, in the fog, exchanges of fire between unseen troops could be heard. There, it seemed to Prince Andrei, the battle would concentrate, there an obstacle would be encountered, and  it s there that I ll be sent with a brigade or division, and there, with a standard in my hand, I ll go forward and crush everything ahead of me.   Prince Andrei could not look with indifference at the standards of the battalions going past him. Looking at a standard, he thought: maybe it is that very standard with which I ll have to march at the head of the troops.  By morning the night s fog had left only hoarfrost turning into dew on the heights, but in the hollows the fog still spread its milk-white sea. Nothing could be seen in that hollow to the left, into which our troops had descended and from which came the sounds of gunfire. Over the heights was a dark, clear sky, and to the right-the enormous ball of the sun. Far ahead, on the other shore of the sea of fog, one could make out the jutting, wooded hills on which the enemy army was supposed to be, and something was discernible. To the right the guards were entering the region of the fog, with a sound of tramping and wheels and an occasional gleam of bayonets; to the left, beyond the village, similar masses of cavalry approached and disappeared into the sea of fog. In front and behind moved the infantry. The commander in chief stood on the road out of the village, letting the troops pass by him. Kutuzov seemed exhausted and irritable that morning. The infantry going past him halted without any command, apparently because something ahead held them up.   But tell them, finally, to form into battalions and go around the village,  Kutuzov said angrily to a general who rode up.  Don t you understand, Your Excellency, my dear sir, that to stretch out in a defile through village streets is impossible when we re marching against an enemy?    I intended to form them up outside the village, Your Excellency,  said the general.  Kutuzov laughed biliously.   A fine sight you d be, lining up in view of the enemy, a very fine sight!    The enemy s still far off, Your Excellency. According to the disposition . . .    The disposition!  Kutuzov exclaimed biliously.  Who told you that? . . . Kindly do as you re ordered.    Yes, sir!    Mon cher,  Nesvitsky said to Prince Andrei in a whisper,  le vieux est d une humeur de chien. [2]  An Austrian officer in a white uniform with green plumes on his hat rode up to Kutuzov and asked on behalf of the emperor whether the fourth column had started into action.  Kutuzov turned away without answering him, and his gaze chanced to rest on Prince Andrei, who was standing close by. Seeing Bolkonsky, Kutuzov softened the angry and caustic expression of his gaze, as if aware that his adjutant was not to blame for what was going on. And, without answering the Austrian adjutant, he addressed Bolkonsky:   Allez voir, mon cher, si la troisième division a dépassé le village. Dites-lui de s arrêter et d attendre mes ordres. [3]  Prince Andrei had only just started when he stopped him.   Et demandez-lui si les tirailleurs sont postés,  he added.  Ce qu ils font, ce qu ils font! [4] he said to himself, still not answering the Austrian.  Prince Andrei galloped off to carry out his mission.  Overtaking all the advancing battalions, he stopped the third division and ascertained that there was in fact no line of riflemen in front of our columns. The regimental commander of the front regiment was very surprised by the order conveyed to him from the commander in chief to send out riflemen. The regimental commander stood there in the full conviction that there were more troops ahead of him, and that the enemy was no less than six miles away. In fact, nothing could be seen ahead but empty terrain sloping away and covered with thick fog. Having ordered on behalf of the commander in chief that the omission be rectified, Prince Andrei galloped back. Kutuzov still stood in the same place and, his corpulent body sagging over the saddle in old man s fashion, yawned deeply, closing his eyes. The troops were no longer moving, but stood at parade rest.   Very good, very good,  he said to Prince Andrei and turned to a general who stood there with a watch in his hand, saying it was time to move on, because all the columns of the left flank had already descended.   We still have time, Your Excellency,  Kutuzov said through a yawn.  We have time!  he repeated.  Just then, from well behind Kutuzov, came shouts of regimental greetings, and these voices began to approach quickly along the whole extended line of the advancing Russian columns. It was clear that the one being greeted was riding quickly. When the soldiers of the regiment Kutuzov was standing in front of began to shout, he rode slightly to one side and, wincing, turned to look. Down the road from Pratz galloped what looked like a squadron of varicolored horsemen. Two of them rode side by side at a great gallop ahead of the rest. One, in a black uniform with white plumes, rode a bobtailed chestnut horse, the other, in a white uniform, rode a black horse. These were the two emperors with their suite. Kutuzov, with the affectation of a frontline veteran, ordered his standing troops to  attention  and, saluting, rode up to the emperor. His whole figure and manner suddenly changed. He acquired the look of a subordinate, unthinking man. With affected deference, which obviously struck the emperor Alexander unpleasantly, he rode up and saluted him.  The unpleasant impression, like the remains of fog in a clear sky, passed over the emperor s young and happy face and disappeared. He was somewhat thinner that day, after his illness, than on the field of Olmütz, where Bolkonsky had seen him for the first time abroad, but there was the same enchanting combination of majesty and mildness in his beautiful gray eyes, and the fine lips had the same possibility of various expressions, with a prevalent expression of good-natured, innocent youth.  At the Olmütz review he was more majestic; here he was more cheerful and energetic. He was slightly flushed after galloping two miles and, reining in his horse, gave a sigh of relief and looked around at the faces of his suite, as young, as animated as his own. Czartoryski and Novosiltsev, and Prince Volkonsky and Stroganov, and the others, all richly clad, cheerful young men on splendid, pampered, fresh, only slightly sweaty horses, talking and smiling, stopped behind the sovereign. The emperor Franz, a ruddy, long-faced young man, sat extremely straight on his handsome black stallion and looked around him with a preoccupied, unhurried air. He called up one of his white adjutants and asked something.  Most likely what time they started,  thought Prince Andrei, observing his old aquaintance, and recalling his audience with a smile he was unable to repress. In the emperors  suite there were picked fine young orderly officers, Russian and Austrian, from the guards and infantry regiments. Among them were grooms leading the handsome spare horses of the royalty in embroidered cloths.  As fresh air from the fields suddenly breathes through an open window into a stuffy room, so youth, energy, and certainty of success breathed upon Kutuzov s cheerless staff as these brilliant young men galloped up.   Why don t you begin, Mikhail Larionovich?  the emperor Alexander hurriedly addressed Kutuzov, at the same time glancing courteously at the emperor Franz.   I am waiting, Your Majesty,  answered Kutuzov, inclining deferentially.  The emperor cupped his ear, frowning slightly and showing that he had not heard properly.   I m waiting, Your Majesty,  Kutuzov repeated (Prince Andrei noticed that Kutuzov s upper lip twitched unnaturally as he said this  waiting ).  Not all the columns are assembled, Your Majesty.   The sovereign heard, but this reply clearly did not please him; he shrugged his slightly stooping shoulders, glanced at Novosiltsev, who stood nearby, as if complaining of Kutuzov by this glance.   We re not on the Tsaritsyn Field,[5] Mikhail Larionovich, where you don t start a parade until all the regiments are assembled,  said the sovereign, again glancing into the eyes of the emperor Franz, as though inviting him, if not to take part, at least to listen to what he was saying; but the emperor Franz went on looking around and did not listen.   That is just why I do not begin, Sire,  Kutuzov said in a ringing voice, as if to forestall the possibility of not being heard, and again something twitched in his face.  I do not begin, Sire, because we are not on parade and not on the Tsaritsyn Field,  he uttered clearly and distinctly.  All the faces in the sovereign s suite instantly exchanged glances with each other, expressing murmur and reproach.  Old as he may be, he should not, he simply should not speak that way,  these faces expressed.  The sovereign looked fixedly and attentively into Kutuzov s eyes, waiting to see if he would say something more. But Kutuzov, for his part, bowed his head deferentially and also seemed to be waiting. The silence lasted for about a minute.   However, if you order it, Your Majesty,  said Kutuzov, raising his head and again changing his tone to that of a dull, unthinking, but obedient general.  He touched up his horse and, calling to him the column leader Miloradovich, gave him the order to advance.  The troops stirred again, and two battalions of the Novgorodsky regiment and a battalion of the Apsheronsky regiment moved on past the sovereign.  While this Apsheronsky battalion was marching by, ruddy-faced Miloradovich, with no greatcoat, in his uniform tunic and decorations and a hat with enormous plumes, worn at an angle and brim first, galloped ahead hup-two, and with a dashing salute, reined in his horse before the sovereign.   God be with you, General,  said the sovereign.   Ma foi, sire, nous ferons ce que qui sera dans notre possibilité, sire! [6] he replied merrily, nevertheless calling up mocking smiles among the gentlemen of the suite with his bad French.  Miloradovich turned his horse sharply and placed himself slightly behind the sovereign. The Apsherontsy, excited by the presence of the sovereign, marched past the emperors and their suite at a dashingly brisk pace, beating their feet.   Lads!  cried Miloradovich in a loud, self-assured, and merry voice, obviously so excited by the sounds of gunfire, the anticipation of battle, and the sight of his gallant Apsherontsy-his companions from Suvorov s time-marching briskly past the emperors, that he forgot the sovereign s presence.  Lads, it won t be the first village you ve taken!  he shouted.   We do our best, sir!  the soldiers shouted out.  The sovereign s horse shied at the sudden shout. This horse, who had carried the sovereign at reviews while still in Russia, also carried her rider here, on the field of Austerlitz, enduring the distracted nudges of his left foot, pricked up her ears at the sound of gunshots just as she did on the Field of Mars, understanding neither the meaning of the shots she heard, nor the presence of the emperor Franz s black stallion, nor anything of what her rider said, thought, or felt that day.  The sovereign turned with a smile to one of his retinue, pointing to the gallant Apsherontsy, and said something to him." -n 64 -c 4096 -ngl 12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3050 Laptop GPU, compute capability 8.6, VMM: yes
register_backend: registered backend CUDA (1 devices)
register_device: registered device CUDA0 (NVIDIA GeForce RTX 3050 Laptop GPU)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (11th Gen Intel(R) Core(TM) i5-11400H @ 2.70GHz)
build: 4393 (d79d8f39) with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu (debug)
main: llama backend init
main: load the model and apply lora adapter, if any
common_download_file: previous metadata file found /home/phymbert/.cache/llama.cpp/microsoft_Phi-3-mini-4k-instruct-gguf_Phi-3-mini-4k-instruct-q4.gguf.json: {"etag":"\"bcfbb62e845dcfa1bcfd85ce58b59276-150\"","lastModified":"Tue, 30 Apr 2024 12:50:26 GMT","url":"https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf"}
curl_perform_with_retry: Trying to download from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf (attempt 1 of 3)...
llama_load_model_from_file: using device CUDA0 (NVIDIA GeForce RTX 3050 Laptop GPU) - 3765 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 195 tensors from /home/phymbert/.cache/llama.cpp/microsoft_Phi-3-mini-4k-instruct-gguf_Phi-3-mini-4k-instruct-q4.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi3
llama_model_loader: - kv   1:                               general.name str              = Phi3
llama_model_loader: - kv   2:                        phi3.context_length u32              = 4096
llama_model_loader: - kv   3:                      phi3.embedding_length u32              = 3072
llama_model_loader: - kv   4:                   phi3.feed_forward_length u32              = 8192
llama_model_loader: - kv   5:                           phi3.block_count u32              = 32
llama_model_loader: - kv   6:                  phi3.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi3.attention.head_count_kv u32              = 32
llama_model_loader: - kv   8:      phi3.attention.layer_norm_rms_epsilon f32              = 0,000010
llama_model_loader: - kv   9:                  phi3.rope.dimension_count u32              = 96
llama_model_loader: - kv  10:                          general.file_type u32              = 15
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                         tokenizer.ggml.pre str              = default
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32064]   = [0,000000, 0,000000, 0,000000, 0,0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32064]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 32000
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 32000
llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:   81 tensors
llama_model_loader: - type q5_K:   32 tensors
llama_model_loader: - type q6_K:   17 tensors
llm_load_vocab: control-looking token:  32007 '<|end|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: control-looking token:  32000 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: special tokens cache size = 67
llm_load_vocab: token to piece cache size = 0,1690 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = phi3
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32064
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 96
llm_load_print_meta: n_swa            = 2047
llm_load_print_meta: n_embd_head_k    = 96
llm_load_print_meta: n_embd_head_v    = 96
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 3072
llm_load_print_meta: n_embd_v_gqa     = 3072
llm_load_print_meta: f_norm_eps       = 0,0e+00
llm_load_print_meta: f_norm_rms_eps   = 1,0e-05
llm_load_print_meta: f_clamp_kqv      = 0,0e+00
llm_load_print_meta: f_max_alibi_bias = 0,0e+00
llm_load_print_meta: f_logit_scale    = 0,0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000,0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 3,82 B
llm_load_print_meta: model size       = 2,23 GiB (5,01 BPW) 
llm_load_print_meta: general.name     = Phi3
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
llm_load_print_meta: EOT token        = 32007 '<|end|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_print_meta: EOG token        = 32000 '<|endoftext|>'
llm_load_print_meta: EOG token        = 32007 '<|end|>'
llm_load_print_meta: max token length = 48
llm_load_tensors: offloading 12 repeating layers to GPU
llm_load_tensors: offloaded 12/33 layers to GPU
llm_load_tensors:        CUDA0 model buffer size =   813,09 MiB
llm_load_tensors:   CPU_Mapped model buffer size =  2281,66 MiB
.............................................................................................
llama_new_context_with_model: n_seq_max     = 1
llama_new_context_with_model: n_ctx         = 4096
llama_new_context_with_model: n_ctx_per_seq = 4096
llama_new_context_with_model: n_batch       = 2048
llama_new_context_with_model: n_ubatch      = 512
llama_new_context_with_model: flash_attn    = 0
llama_new_context_with_model: freq_base     = 10000,0
llama_new_context_with_model: freq_scale    = 1
llama_kv_cache_init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32
llama_kv_cache_init:      CUDA0 KV buffer size =   576,00 MiB
llama_kv_cache_init:        CPU KV buffer size =   960,00 MiB
llama_new_context_with_model: KV self size  = 1536,00 MiB, K (f16):  768,00 MiB, V (f16):  768,00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0,12 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   340,56 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    20,01 MiB
llama_new_context_with_model: graph nodes  = 1286
llama_new_context_with_model: graph splits = 164 (with bs=512), 3 (with bs=1)
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 6

system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CUDA : ARCHS = 860 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

sampler seed: 3963752706
sampler params: 
	repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000
	dry_multiplier = 0,000, dry_base = 1,750, dry_allowed_length = 2, dry_penalty_last_n = 4096
	top_k = 40, top_p = 0,950, min_p = 0,050, xtc_probability = 0,000, xtc_threshold = 0,100, typical_p = 1,000, temp = 0,800
	mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist 
generate: n_ctx = 4096, n_batch = 2048, n_predict = 64, n_keep = 1

 At eight o clock Kutuzov rode to Pratz at the head of Miloradovich s fourth column, the one which was to take the place of the columns of Przebyszewski and Langeron, which had already gone down. He greeted the men of the head regiment and gave the order to move, thus showing that he intended to lead the column himself. Having ridden to the village of Pratz, he halted. Prince Andrei, one of the enormous number of persons constituting the commander in chief s suite, stood behind him. Prince Andrei felt excited, irritated, and at the same time restrainedly calm, as a man usually is when a long-desired moment comes. He was firmly convinced that this was the day of his Toulon or his bridge of Arcole.[1] How it would happen, he did not know, but he was firmly convinced that it would be so. The locality and position of our troops were known to him, as far as they could be known to anyone in our army. His own strategic plan, which there obviously could be no thought of carrying out now, was forgotten. Now, entering into Weyrother s plan, Prince Andrei pondered the possible happenstances and came up with new considerations, such as might call for his swiftness of reflection and decisiveness.  To the left below, in the fog, exchanges of fire between unseen troops could be heard. There, it seemed to Prince Andrei, the battle would concentrate, there an obstacle would be encountered, and  it s there that I ll be sent with a brigade or division, and there, with a standard in my hand, I ll go forward and crush everything ahead of me.   Prince Andrei could not look with indifference at the standards of the battalions going past him. Looking at a standard, he thought: maybe it is that very standard with which I ll have to march at the head of the troops.  By morning the night s fog had left only hoarfrost turning into dew on the heights, but in the hollows the fog still spread its milk-white sea. Nothing could be seen in that hollow to the left, into which our troops had descended and from which came the sounds of gunfire. Over the heights was a dark, clear sky, and to the right-the enormous ball of the sun. Far ahead, on the other shore of the sea of fog, one could make out the jutting, wooded hills on which the enemy army was supposed to be, and something was discernible. To the right the guards were entering the region of the fog, with a sound of tramping and wheels and an occasional gleam of bayonets; to the left, beyond the village, similar masses of cavalry approached and disappeared into the sea of fog. In front and behind moved the infantry. The commander in chief stood on the road out of the village, letting the troops pass by him. Kutuzov seemed exhausted and irritable that morning. The infantry going past him halted without any command, apparently because something ahead held them up.   But tell them, finally, to form into battalions and go around the village,  Kutuzov said angrily to a general who rode up.  Don t you understand, Your Excellency, my dear sir, that to stretch out in a defile through village streets is impossible when we re marching against an enemy?    I intended to form them up outside the village, Your Excellency,  said the general.  Kutuzov laughed biliously.   A fine sight you d be, lining up in view of the enemy, a very fine sight!    The enemy s still far off, Your Excellency. According to the disposition . . .    The disposition!  Kutuzov exclaimed biliously.  Who told you that? . . . Kindly do as you re ordered.    Yes, sir!    Mon cher,  Nesvitsky said to Prince Andrei in a whisper,  le vieux est d une humeur de chien. [2]  An Austrian officer in a white uniform with green plumes on his hat rode up to Kutuzov and asked on behalf of the emperor whether the fourth column had started into action.  Kutuzov turned away without answering him, and his gaze chanced to rest on Prince Andrei, who was standing close by. Seeing Bolkonsky, Kutuzov softened the angry and caustic expression of his gaze, as if aware that his adjutant was not to blame for what was going on. And, without answering the Austrian adjutant, he addressed Bolkonsky:   Allez voir, mon cher, si la troisième division a dépassé le village. Dites-lui de s arrêter et d attendre mes ordres. [3]  Prince Andrei had only just started when he stopped him.   Et demandez-lui si les tirailleurs sont postés,  he added.  Ce qu ils font, ce qu ils font! [4] he said to himself, still not answering the Austrian.  Prince Andrei galloped off to carry out his mission.  Overtaking all the advancing battalions, he stopped the third division and ascertained that there was in fact no line of riflemen in front of our columns. The regimental commander of the front regiment was very surprised by the order conveyed to him from the commander in chief to send out riflemen. The regimental commander stood there in the full conviction that there were more troops ahead of him, and that the enemy was no less than six miles away. In fact, nothing could be seen ahead but empty terrain sloping away and covered with thick fog. Having ordered on behalf of the commander in chief that the omission be rectified, Prince Andrei galloped back. Kutuzov still stood in the same place and, his corpulent body sagging over the saddle in old man s fashion, yawned deeply, closing his eyes. The troops were no longer moving, but stood at parade rest.   Very good, very good,  he said to Prince Andrei and turned to a general who stood there with a watch in his hand, saying it was time to move on, because all the columns of the left flank had already descended.   We still have time, Your Excellency,  Kutuzov said through a yawn.  We have time!  he repeated.  Just then, from well behind Kutuzov, came shouts of regimental greetings, and these voices began to approach quickly along the whole extended line of the advancing Russian columns. It was clear that the one being greeted was riding quickly. When the soldiers of the regiment Kutuzov was standing in front of began to shout, he rode slightly to one side and, wincing, turned to look. Down the road from Pratz galloped what looked like a squadron of varicolored horsemen. Two of them rode side by side at a great gallop ahead of the rest. One, in a black uniform with white plumes, rode a bobtailed chestnut horse, the other, in a white uniform, rode a black horse. These were the two emperors with their suite. Kutuzov, with the affectation of a frontline veteran, ordered his standing troops to  attention  and, saluting, rode up to the emperor. His whole figure and manner suddenly changed. He acquired the look of a subordinate, unthinking man. With affected deference, which obviously struck the emperor Alexander unpleasantly, he rode up and saluted him.  The unpleasant impression, like the remains of fog in a clear sky, passed over the emperor s young and happy face and disappeared. He was somewhat thinner that day, after his illness, than on the field of Olmütz, where Bolkonsky had seen him for the first time abroad, but there was the same enchanting combination of majesty and mildness in his beautiful gray eyes, and the fine lips had the same possibility of various expressions, with a prevalent expression of good-natured, innocent youth.  At the Olmütz review he was more majestic; here he was more cheerful and energetic. He was slightly flushed after galloping two miles and, reining in his horse, gave a sigh of relief and looked around at the faces of his suite, as young, as animated as his own. Czartoryski and Novosiltsev, and Prince Volkonsky and Stroganov, and the others, all richly clad, cheerful young men on splendid, pampered, fresh, only slightly sweaty horses, talking and smiling, stopped behind the sovereign. The emperor Franz, a ruddy, long-faced young man, sat extremely straight on his handsome black stallion and looked around him with a preoccupied, unhurried air. He called up one of his white adjutants and asked something.  Most likely what time they started,  thought Prince Andrei, observing his old aquaintance, and recalling his audience with a smile he was unable to repress. In the emperors  suite there were picked fine young orderly officers, Russian and Austrian, from the guards and infantry regiments. Among them were grooms leading the handsome spare horses of the royalty in embroidered cloths.  As fresh air from the fields suddenly breathes through an open window into a stuffy room, so youth, energy, and certainty of success breathed upon Kutuzov s cheerless staff as these brilliant young men galloped up.   Why don t you begin, Mikhail Larionovich?  the emperor Alexander hurriedly addressed Kutuzov, at the same time glancing courteously at the emperor Franz.   I am waiting, Your Majesty,  answered Kutuzov, inclining deferentially.  The emperor cupped his ear, frowning slightly and showing that he had not heard properly.   I m waiting, Your Majesty,  Kutuzov repeated (Prince Andrei noticed that Kutuzov s upper lip twitched unnaturally as he said this  waiting ).  Not all the columns are assembled, Your Majesty.   The sovereign heard, but this reply clearly did not please him; he shrugged his slightly stooping shoulders, glanced at Novosiltsev, who stood nearby, as if complaining of Kutuzov by this glance.   We re not on the Tsaritsyn Field,[5] Mikhail Larionovich, where you don t start a parade until all the regiments are assembled,  said the sovereign, again glancing into the eyes of the emperor Franz, as though inviting him, if not to take part, at least to listen to what he was saying; but the emperor Franz went on looking around and did not listen.   That is just why I do not begin, Sire,  Kutuzov said in a ringing voice, as if to forestall the possibility of not being heard, and again something twitched in his face.  I do not begin, Sire, because we are not on parade and not on the Tsaritsyn Field,  he uttered clearly and distinctly.  All the faces in the sovereign s suite instantly exchanged glances with each other, expressing murmur and reproach.  Old as he may be, he should not, he simply should not speak that way,  these faces expressed.  The sovereign looked fixedly and attentively into Kutuzov s eyes, waiting to see if he would say something more. But Kutuzov, for his part, bowed his head deferentially and also seemed to be waiting. The silence lasted for about a minute.   However, if you order it, Your Majesty,  said Kutuzov, raising his head and again changing his tone to that of a dull, unthinking, but obedient general.  He touched up his horse and, calling to him the column leader Miloradovich, gave him the order to advance.  The troops stirred again, and two battalions of the Novgorodsky regiment and a battalion of the Apsheronsky regiment moved on past the sovereign.  While this Apsheronsky battalion was marching by, ruddy-faced Miloradovich, with no greatcoat, in his uniform tunic and decorations and a hat with enormous plumes, worn at an angle and brim first, galloped ahead hup-two, and with a dashing salute, reined in his horse before the sovereign.   God be with you, General,  said the sovereign.   Ma foi, sire, nous ferons ce que qui sera dans notre possibilité, sire! [6] he replied merrily, nevertheless calling up mocking smiles among the gentlemen of the suite with his bad French.  Miloradovich turned his horse sharply and placed himself slightly behind the sovereign. The Apsherontsy, excited by the presence of the sovereign, marched past the emperors and their suite at a dashingly brisk pace, beating their feet.   Lads!  cried Miloradovich in a loud, self-assured, and merry voice, obviously so excited by the sounds of gunfire, the anticipation of battle, and the sight of his gallant Apsherontsy-his companions from Suvorov s time-marching briskly past the emperors, that he forgot the sovereign s presence.  Lads, it won t be the first village you ve taken!  he shouted.   We do our best, sir!  the soldiers shouted out.  The sovereign s horse shied at the sudden shout. This horse, who had carried the sovereign at reviews while still in Russia, also carried her rider here, on the field of Austerlitz, enduring the distracted nudges of his left foot, pricked up her ears at the sound of gunshots just as she did on the Field of Mars, understanding neither the meaning of the shots she heard, nor the presence of the emperor Franz s black stallion, nor anything of what her rider said, thought, or felt that day.  The sovereign turned with a smile to one of his retinue, pointing to the gallant Apsherontsy, and said something to him.  You may well wish to know the exact whereabouts of these lads,  he said with an air of authority and a smile.  The officer, who was not quite a Russian, and who had come with the emperor Franz, bowed low in his Russian way and began to look around at the

llama_perf_sampler_print:    sampling time =      12,62 ms /  3275 runs   (    0,00 ms per token, 259591,00 tokens per second)
llama_perf_context_print:        load time =    1152,86 ms
llama_perf_context_print: prompt eval time =    7369,21 ms /  3211 tokens (    2,29 ms per token,   435,73 tokens per second)
llama_perf_context_print:        eval time =   25582,04 ms /    63 runs   (  406,06 ms per token,     2,46 tokens per second)
llama_perf_context_print:       total time =   33012,16 ms /  3274 tokens

Process finished with exit code 0

Closing the issue, please reopen if I missed something here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) model Model specific
Projects
None yet
Development

No branches or pull requests

9 participants