Addition of DRY: A modern repetition penalty that reliably prevents looping #447

awtrisk · 2024-05-09T14:14:31Z

Would it be worth it to add DRY as an alternative to the traditional repetition penalty? Users have reported that it actually works, and the PR on the ooba repo itself seems to be solid. It also has a llama.cpp PR. There seem to be barely any downsides to it too.

If it seems good, I can make the PR and implement it here.

turboderp · 2024-05-09T16:09:36Z

As far as I can tell it's basically just an n-gram penalty, but without combining it with a beam search it doesn't really solve offer a way to discourage repetitions before they occur. I.e. the model is allowed to start down the path of a repetition, and it's only somewhere along that path that the penalty kicks in, at which point it's impossible to turn back.

So I'm not too sure about it. Are there any thorough comparisons to other methods like increased temperature, skew, frequency penalty etc.?

awtrisk · 2024-05-10T12:18:44Z

AFAIK I don't think this was meant to discourage against repetition, but instead that when a pattern of repetition occurs, it can quickly cull it by biasing against the mean repeated tokens. Imo this is better than the current ways of preventing repetition we have.

@p-e-w may be able to shed more insight on things like comparisons, although I will be testing it with other samplers.

p-e-w · 2024-05-12T07:43:08Z

DRY is indeed an n-gram/sequence penalty, but it works a little differently from no_repeat_ngram_size and other proposals I've seen. The differences can be summarized as follows:

The penalty grows smoothly with the length of the repeated sequence, preventing garbage from being generated in situations where extending a repetition is mandated by the context and no_repeat_ngram_size and its ilk just slam the door.
The penalty grows exponentially with the length of the repeated sequence, guaranteeing that the model's tendency to loop is eventually overcome. Many models, when presented with a partially repeated sequence, will overwhelmingly predict continuing the repetition, so slower-growing penalties can be insufficient.
The "sequence breakers" mechanism protects the structure of chat/instruction templates from being penalized, allowing much stronger penalties to be used without negative effects. I have extensively tested this in chat scenarios.

Simply put, it works. I and others have been running DRY for over two months now, and it's such a massive improvement over traditional repetition penalties that I can't imagine going back. Looping is a scourge, and the existing penalties are a cure that's almost worse than the disease, being noticeably detrimental to output quality. DRY is far better than the three flavors of RepPen at actually preventing repetition, while leaving standard sentence structure completely unaffected.

All samplers are hacks by definition (we should be able to just use the distribution from the model as-is). DRY was developed not primarily from theoretical considerations, but guided by constant real-world experimentation. Having generated and examined probably in excess of 200k tokens in well over 100 contexts by now using DRY, I can confidently say that it works, and enables results that cannot be replicated using any combination of the widely available samplers of today.

yamosin · 2024-05-15T10:07:41Z

Really looking forward to seeing it implemented on TabbyAPI

AgeOfAlgorithms · 2024-06-07T04:40:41Z

bump

Vhallo · 2024-06-10T22:29:41Z

The performance issues have been solved by now thanks to belladoreai, so might be worthwhile to integrate this now.

AgeOfAlgorithms · 2024-06-15T17:32:04Z

I just wanted to bring this comment by @belladoreai here for eveyone's convenience. It gives another good reason why no_repeat_ngram_size is unsuitable for stopping repetition. This was from their discussion with @p-e-w

For what it's worth, I've done a lot of experimentation with no_repeat_ngram_size in the past and I can confirm it's fairly useless in a chat context. It might be useful in other contexts, especially in contexts where the input is relatively small. But when a chat message history grows, using no_repeat_ngram_size typically leads to situations where the model is intentionally writing broken english (like writing "engglish" instead of "english"), where the brokenness of the language just grows more and more absurd over time. This seems to happen because in many cases (especially with smaller models) the model perceives repetitive output to be extremely likely - so likely, that even broken versions of the repetitive output appear more likely than some other alternative continuation of the text. So when we prevent the model from generating the exact same repetitive continuation to the text, it chooses to use a broken alternative version of the same repetitive text instead of choosing some more natural text.

I do not recommend using no_repeat_ngram_size except at very high values, if no other "circuit breaker" for repetition exists.

Vibecoder9000 · 2024-09-01T04:29:29Z

What's the status on this? Sorry if I'm missing something in github, but it just seems to have stalled. DRY is great, but moving from KoboldCPP to Tabbyapi leaves my models significantly dumber.

turboderp · 2024-09-01T08:50:06Z

What settings are you using for Kobold and ExLlama, respectively? And how are you defining dumber?

The short answer to your question is that it's been suggested someone PR it, I've agreed that it may be worth adding at some point (though I have a long, long list of other things to add as well so I'm not sure about the priorities), and I'm still waiting on concrete examples of what DRY achieves in practice, and how it does so without degrading the output.

kingbri1 · 2024-09-02T03:54:00Z

DRY is a sampler that's meant for breaking loops, so if your outputs are "dumber", I'd look into prompting, parameters, character cards, the model itself, etc. Those are more likely points for regressions to occur. DRY may have been masking that since a single sampler isn't a magic bullet.

I agree with turbo, DRY is on the timeline to be added in exl2 eventually, but our time is limited and there are a bunch of features outside of sampling to tackle.

If someone does make a PR (like how every other backend added it), that will make it much easier to get the sampler in faster.

p-e-w · 2024-09-02T11:42:09Z

DRY is a sampler that's meant for breaking loops, so if your outputs are "dumber", I'd look into prompting, parameters, character cards, the model itself, etc.

It's not that simple. If DRY is unavailable, users are often forced to enable standard presence/frequency repetition penalties to combat looping. And those "established" samplers absolutely do make models dumber. That's because they penalize tokens that form the backbone of standard language: Articles, prepositions, punctuation, etc. In doing so, they can significantly distort the probability distribution predicted by the model, affecting output quality.

With the default parameter values, DRY only penalizes repeated sequences of 3 tokens or more. This leaves the distributions for the vast majority of token positions completely untouched, and prevents many of the issues caused by traditional penalties. Therefore, when substituting standard penalties with DRY, it is quite possible for a model to feel smarter.

baronrabban · 2024-09-02T12:12:07Z

But when a chat message history grows, using no_repeat_ngram_size typically leads to situations where the model is intentionally writing broken english (like writing "engglish" instead of "english"), where the brokenness of the language just grows more and more absurd over time. This seems to happen because in many cases (especially with smaller models) the model perceives repetitive output to be extremely likely - so likely, that even broken versions of the repetitive output appear more likely than some other alternative continuation of the text.

I have experienced a version of this but in my case it began concatenating words together. Also encountered a situation where it just started liberally inserting newlines every couple words or so. I think the main thing is thing is that it's not obvious where this behavior is coming from. Nothing says "I'm doing this crazy thing because you set DRY a few hours ago", so it can be confusing until you turn DRY off and it stops doing it.

As was stated in the quoted comment the model really wants to write this text and I think it's going to find a way no matter what restrictions you try to place on it.

turboderp · 2024-09-02T12:41:58Z

Concatenation is probably explained by the fact that a lot of tokens are duplicated in the vocabulary, with and without a leading space. So if the model tries to say United States of America but it isn't allowed to because it's already said it twice or whatever, it could easily reach a point where it's already sampled United States of and then, being suddenly barred from sampling America, it will choose America instead since that's going to have a very similar embedding vector. Then the result is United States ofAmerica and nobody's happy.

You'd want to hope that a good model doesn't have America and America as its top two choices if the latter would break the language grammar, but at the same time a good model needs to understand that the two tokens convey the same meaning otherwise, so that e.g. tokenizing a string like "America" produces tokens that encode "the word 'America', in quotes". When dealing with merges and questionable finetunes, you can't take it for granted that the model understands/retains those nuances, let alone under the influence of too many other sampling rules. And there's a vaguely defined noise floor from quantization that you have to take into account also.

Vibecoder9000 · 2024-09-02T19:46:05Z

Midnight miqu 70b 2bpw, no DRY: perfect results.

Mistral Dory v2 12b 6bpw, no DRY: Some repeating, nothing serious but will degrade further with continued use.

Gemma 2 2b 8bpw, no DRY: This model might just suck.

Gemma 2 2b q4ks, DRY at 1.3 multiply and 2 base. Some issues with markdown formatting.

Gemma 2 27b qks, same DRY. This is forced repetition, all up until the last one. Rerolling the not prefilled response results in about 1/3 changing it from smiled to something else. The screenshot is that 1/3 happening.

Overall, I think it's good, even if disabled by default. For some models and long chats, it can completely save the chat as every event detailed here becomes more intense over time.

Vibecoder9000 · 2024-09-18T12:53:09Z

How was this added without merging here? It is in latest build with sillytavern.

turboderp · 2024-09-18T13:00:11Z

Not sure I understand the question. DRY is implemented here: affdc0d...c1fed2e

kingbri1 mentioned this issue Aug 11, 2024

[REQUEST] Implementation of DRY sampler (Don't Repeat Yourself) theroyallab/tabbyAPI#164

Closed

3 tasks

turboderp closed this as completed Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Addition of DRY: A modern repetition penalty that reliably prevents looping #447

Addition of DRY: A modern repetition penalty that reliably prevents looping #447

awtrisk commented May 9, 2024

turboderp commented May 9, 2024

Uh oh!

awtrisk commented May 10, 2024

Uh oh!

p-e-w commented May 12, 2024

Uh oh!

yamosin commented May 15, 2024

Uh oh!

AgeOfAlgorithms commented Jun 7, 2024

Uh oh!

Vhallo commented Jun 10, 2024

Uh oh!

AgeOfAlgorithms commented Jun 15, 2024 •

edited

Loading

Uh oh!

Vibecoder9000 commented Sep 1, 2024

Uh oh!

turboderp commented Sep 1, 2024

Uh oh!

kingbri1 commented Sep 2, 2024 •

edited

Loading

Uh oh!

p-e-w commented Sep 2, 2024

Uh oh!

baronrabban commented Sep 2, 2024

Uh oh!

turboderp commented Sep 2, 2024

Uh oh!

Vibecoder9000 commented Sep 2, 2024 •

edited

Loading

Uh oh!

Vibecoder9000 commented Sep 18, 2024

Uh oh!

turboderp commented Sep 18, 2024

Uh oh!

Uh oh!

Addition of DRY: A modern repetition penalty that reliably prevents looping #447

Addition of DRY: A modern repetition penalty that reliably prevents looping #447

Comments

awtrisk commented May 9, 2024

turboderp commented May 9, 2024

Uh oh!

awtrisk commented May 10, 2024

Uh oh!

p-e-w commented May 12, 2024

Uh oh!

yamosin commented May 15, 2024

Uh oh!

AgeOfAlgorithms commented Jun 7, 2024

Uh oh!

Vhallo commented Jun 10, 2024

Uh oh!

AgeOfAlgorithms commented Jun 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vibecoder9000 commented Sep 1, 2024

Uh oh!

turboderp commented Sep 1, 2024

Uh oh!

kingbri1 commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p-e-w commented Sep 2, 2024

Uh oh!

baronrabban commented Sep 2, 2024

Uh oh!

turboderp commented Sep 2, 2024

Uh oh!

Vibecoder9000 commented Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vibecoder9000 commented Sep 18, 2024

Uh oh!

turboderp commented Sep 18, 2024

Uh oh!

AgeOfAlgorithms commented Jun 15, 2024 •

edited

Loading

kingbri1 commented Sep 2, 2024 •

edited

Loading

Vibecoder9000 commented Sep 2, 2024 •

edited

Loading