Description
What happened?
It's possible I'm misunderstanding samplers and sampler parameters.
It's also possible this is a symptom of a larger problem, where "default" values for some samplers may cause other samplers to be not activated in llama-server
.
Observed behaviour
There are several sampler parameters which, when given "do nothing" default values via llama-server
's /completions
API, seem to cause the XTC sampler to not be used.
Update: The above should be "Given a temperature of 0, even if temperature is not in the requested sampler sequence, the XTC sampler is not used"
The following JSON payload demonstrates the issue:
{
"prompt": "<|im_start|>system\nYou are a creative story writer<|im_end|>\n<|im_start>user\nWrite a story about a wizard who is losing his ability to do magic, and tries everything to get it back.<|im_end|>\n<|im_start|>assistant\n",
"n_predict": 512,
"seed": 1,
"xtc_probability": 0.5,
"xtc_threshold": 0.1,
"samplers": [
"xtc"
],
"top_k": 0,
"tfs_z": 1,
"top_p": 1,
"min_p": 0,
"temperature": 0
}
In my testing, this causes the XTC sampler to not be activated. The vibe was off, and the following hacky debugging that I added was not activating:
diff --git a/src/llama-sampling.cpp b/src/llama-sampling.cpp
index 2e655068..63e0d043 100644
--- a/src/llama-sampling.cpp
+++ b/src/llama-sampling.cpp
@@ -1084,6 +1084,7 @@ static void llama_sample_xtc_apply(struct llama_sampler * smpl, llama_token_data
|| cur_p->size < 2) {
return;
}
+ puts("ok");
std::uniform_real_distribution<float> distribution(0.0f, 1.0f);
float chance = distribution(ctx->rng);
Given the following simpler JSON payload, the hacky debugging was successfully activated:
{
"prompt": "<|im_start|>system\nYou are a creative story writer<|im_end|>\n<|im_start>user\nWrite a story about a wizard who is losing his ability to do magic, and tries everything to get it back.<|im_end|>\n<|im_start|>assistant\n",
"n_predict": 512,
"seed": 1,
"xtc_probability": 0.5,
"xtc_threshold": 0.1,
"samplers": [
"xtc"
]
}
Furthermore, each of the things after my samplers array seem to individually cause XTC to not activate. For example, a temperature
of 0 (without specifying any of top_k
, tfs_z
, top_p
or min_p
) is enough to cause XTC to not activate.
There may be other parameters, including sampler parameters, which cause XTC to not activate, but which I did not test.
(Update: I was wrong about this, it seems as though only temperature == 0
reproduces the issue)
This is problematic for clients such as SillyTavern, which seem to always send all samplers in the array but which rely on sending default parameters (e.g. 0 in the case of temperature
) to cause them to be effectively disabled. Such a client will never be able to activate XTC not activate XTC if the user gives a temperature of 0 in the hopes of disabling the temperature sampler.
Expected behaviour
If XTC is in the samplers array, and xtc_threshold
and xtc_probability
meet the criteria for XTC to be used, XTC should be used regardless of parameters for other samplers.
More generally, if any sampler is in the samplers array, and its parameters meet the criteria for it to be used, it should be used regardless of parameters for other samplers (?)
Related
- sampling : add XTC sampler #9742
- [FEATURE_REQUEST] enable XTC for llama.cpp SillyTavern/SillyTavern#2992
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 3923 (becfd387)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response