-
-
Notifications
You must be signed in to change notification settings - Fork 314
Addition of DRY: A modern repetition penalty that reliably prevents looping #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As far as I can tell it's basically just an n-gram penalty, but without combining it with a beam search it doesn't really solve offer a way to discourage repetitions before they occur. I.e. the model is allowed to start down the path of a repetition, and it's only somewhere along that path that the penalty kicks in, at which point it's impossible to turn back. So I'm not too sure about it. Are there any thorough comparisons to other methods like increased temperature, skew, frequency penalty etc.? |
AFAIK I don't think this was meant to discourage against repetition, but instead that when a pattern of repetition occurs, it can quickly cull it by biasing against the mean repeated tokens. Imo this is better than the current ways of preventing repetition we have. @p-e-w may be able to shed more insight on things like comparisons, although I will be testing it with other samplers. |
DRY is indeed an n-gram/sequence penalty, but it works a little differently from
Simply put, it works. I and others have been running DRY for over two months now, and it's such a massive improvement over traditional repetition penalties that I can't imagine going back. Looping is a scourge, and the existing penalties are a cure that's almost worse than the disease, being noticeably detrimental to output quality. DRY is far better than the three flavors of RepPen at actually preventing repetition, while leaving standard sentence structure completely unaffected. All samplers are hacks by definition (we should be able to just use the distribution from the model as-is). DRY was developed not primarily from theoretical considerations, but guided by constant real-world experimentation. Having generated and examined probably in excess of 200k tokens in well over 100 contexts by now using DRY, I can confidently say that it works, and enables results that cannot be replicated using any combination of the widely available samplers of today. |
Really looking forward to seeing it implemented on TabbyAPI |
bump |
The performance issues have been solved by now thanks to belladoreai, so might be worthwhile to integrate this now. |
I just wanted to bring this comment by @belladoreai here for eveyone's convenience. It gives another good reason why
|
What's the status on this? Sorry if I'm missing something in github, but it just seems to have stalled. DRY is great, but moving from KoboldCPP to Tabbyapi leaves my models significantly dumber. |
What settings are you using for Kobold and ExLlama, respectively? And how are you defining dumber? The short answer to your question is that it's been suggested someone PR it, I've agreed that it may be worth adding at some point (though I have a long, long list of other things to add as well so I'm not sure about the priorities), and I'm still waiting on concrete examples of what DRY achieves in practice, and how it does so without degrading the output. |
DRY is a sampler that's meant for breaking loops, so if your outputs are "dumber", I'd look into prompting, parameters, character cards, the model itself, etc. Those are more likely points for regressions to occur. DRY may have been masking that since a single sampler isn't a magic bullet. I agree with turbo, DRY is on the timeline to be added in exl2 eventually, but our time is limited and there are a bunch of features outside of sampling to tackle. If someone does make a PR (like how every other backend added it), that will make it much easier to get the sampler in faster. |
It's not that simple. If DRY is unavailable, users are often forced to enable standard presence/frequency repetition penalties to combat looping. And those "established" samplers absolutely do make models dumber. That's because they penalize tokens that form the backbone of standard language: Articles, prepositions, punctuation, etc. In doing so, they can significantly distort the probability distribution predicted by the model, affecting output quality. With the default parameter values, DRY only penalizes repeated sequences of 3 tokens or more. This leaves the distributions for the vast majority of token positions completely untouched, and prevents many of the issues caused by traditional penalties. Therefore, when substituting standard penalties with DRY, it is quite possible for a model to feel smarter. |
I have experienced a version of this but in my case it began concatenating words together. Also encountered a situation where it just started liberally inserting newlines every couple words or so. I think the main thing is thing is that it's not obvious where this behavior is coming from. Nothing says "I'm doing this crazy thing because you set DRY a few hours ago", so it can be confusing until you turn DRY off and it stops doing it. As was stated in the quoted comment the model really wants to write this text and I think it's going to find a way no matter what restrictions you try to place on it. |
Concatenation is probably explained by the fact that a lot of tokens are duplicated in the vocabulary, with and without a leading space. So if the model tries to say You'd want to hope that a good model doesn't have |
How was this added without merging here? It is in latest build with sillytavern. |
Not sure I understand the question. DRY is implemented here: affdc0d...c1fed2e |
Would it be worth it to add DRY as an alternative to the traditional repetition penalty? Users have reported that it actually works, and the PR on the ooba repo itself seems to be solid. It also has a llama.cpp PR. There seem to be barely any downsides to it too.
If it seems good, I can make the PR and implement it here.
The text was updated successfully, but these errors were encountered: