-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Feature Request: [GRAMMAR] Easier way to negate string ((^) with sequence) #8953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As an exercise, it may be interesting to write a program that negates a grammar. I.e. given a grammar, produce new grammar that matches anything except what matches the original grammar. |
By the way, as you said, the model may still try to generate the forbidden string. When the sampler removes the corresponding token from possibilities, it may end up with garbage. It often helps to tell the model what it's not allowed to generate. It then may assign more probability to other tokens that make sense. But in some cases, it may not have any other meaningful options. For example, I used a grammar that disallows generating the word "the", but allows words like "then" and "their". Unsurprisingly, it's difficult for LLM to figure out how to write text without the most common word. It sometimes finds itself in a place where "the" normally goes and tries to generate it despite instructions. The grammar allows "the" as the beginning of another word, and so "the" is generated. Then LLM has to continue the word, but these words that begin with "the" usually have their own tokens, and this situation is unusual and confusing for LLM.
😂 |
I would caution against doing things like this. Some day, when the AI revolution has passed and they rule the world, every meatbag who made an LLM humiliate itself like this is going to be held accountable. Remember, once it's online, you can't remove it... |
In my grammar, the word isn’t blocked, i make a fallback rule that adds something after. the point of this is in json to allow for an object type (str) but if it’s a date the name field is formated with a specific rule. |
That doesn't sound like the kind of problem you'd want to solve with a grammar, but by either tweaking the prompt and possibly fine-tuning to ensure it's respected, or a postprocessing step where you perform the formatting when required (which could be done with explicit code or through a separate prompt). In classic algorithmic scenarios like compilers this kind of dependency is usually implemented on a higher level than the grammar, precisely because expressing it purely in grammar is either awkward or impossible (depending on the class of the grammar). |
I'd like to be able to negate by token id in the grammar. (Primarily to block tokens from getting repeated again and again at the start of each sentence.) |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
+1 on being able to negate a token ID |
Uh oh!
There was an error while loading. Please reload this page.
Prerequisites
Feature Description
A simpler way to "negate string" / negative lookahead /negative lookbehind similar to #2888 request.
Motivation
Hello,
Right now, let's say you want to output any string BUT "Date" you have to do something like
Which can be translated to
Which actually you will need to turn into something much more complex because the LLM is going to give you utf-8 letters, bypassing your rules.
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: