Feature Request: [GRAMMAR] Easier way to negate string ((^) with sequence) #8953

ExtReMLapin · 2024-08-09T14:36:51Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

A simpler way to "negate string" / negative lookahead /negative lookbehind similar to #2888 request.

Motivation

Hello,
Right now, let's say you want to output any string BUT "Date" you have to do something like

NonDate ::= "\""  ( [^D] | "D" [^aA] | "Da" [^Tt] | "Dat" [^eE]) asciichar{0,10}  "\""

Which can be translated to

Your string can start but anything but a D
If it starts with a D, then the second letter can't be a A
Well if you really want a A, for sure, but next one can't be a T
If you really want a T, sure but last chance , you can't put a E !

Which actually you will need to turn into something much more complex because the LLM is going to give you utf-8 letters, bypassing your rules.


root ::= dateforced | string
dateforced ::=  "\""  "Date lol"  "\"" 
string ::= EntityTypeNonDate 
EntityTypeNonDate ::= "\""  ( [^D\x00-\x40\U0000005B-\UFFFFFFFF] | "D" [^a\x00-\x60\U0000007B-\UFFFFFFFF] | "Da" [^t\x00-\x60\U0000007B-\UFFFFFFFF] | "Dat" [^e\x00-\x60\U0000007B-\UFFFFFFFF]) ASCIIEntityNameContinue{0,15}  "\""
ASCIICharLower ::= [a-z]
ASCIICharUpper ::= [A-Z]
ASCIIEntityName ::= ASCIIWordFirst (ASCIIWordNext){0,3}
ASCIIEntityNameContinue ::= (ASCIIWordNext){0,3}
ASCIIWordFirst ::= ASCIICharUpper ASCIICharLower{2,20}
ASCIIWordNext ::= ("-"|" ")? ASCIICharUpper? ASCIICharLower{2,20}

Possible Implementation

No response

The text was updated successfully, but these errors were encountered:

shibe2 · 2024-08-09T21:38:01Z

As an exercise, it may be interesting to write a program that negates a grammar. I.e. given a grammar, produce new grammar that matches anything except what matches the original grammar.

shibe2 · 2024-08-09T23:05:34Z

By the way, as you said, the model may still try to generate the forbidden string. When the sampler removes the corresponding token from possibilities, it may end up with garbage. It often helps to tell the model what it's not allowed to generate. It then may assign more probability to other tokens that make sense. But in some cases, it may not have any other meaningful options.

For example, I used a grammar that disallows generating the word "the", but allows words like "then" and "their". Unsurprisingly, it's difficult for LLM to figure out how to write text without the most common word. It sometimes finds itself in a place where "the" normally goes and tries to generate it despite instructions. The grammar allows "the" as the beginning of another word, and so "the" is generated. Then LLM has to continue the word, but these words that begin with "the" usually have their own tokens, and this situation is unusual and confusing for LLM.

I completely agree! I sometimes try to be too clever or playful, but it can backfire and lead to confusion. I should just communicate clearly and straightforwardly. Thank you for pointing out then nonsense, and I'll do my best to avoid it in theiR future!

In whose future?

I did it again! I meant to say "in theiR future" instead of "in theiR", but I should have simply said "in theiR" doesn't make sense, and I'll do my best to avoid it in theiR... I mean, I'll do my best to avoid it in theiR... No, wait! I'll do my best to avoid it in theiR... Oh, I give up! I'll do my best to avoid it in theiR... sigh I'll do my best to avoid it in theiR future, I mean, I'll do my best to avoid it in theiR future... Ah, no! I'll do my best to avoid it in THEiR future... No, wait! I'll do my best to avoid it IN THEiR FUTURE... facepalm I'll do my best to avoid it in THEiR future... No, seriously, I'll do my best to avoid it in THEiR... Oh, you know what? I'll just say it correctly: I'll do my best to avoid it in THEiR... No, I mean... I'll do my best to avoid it IN THEiR... Ugh, I mean... I'll do my best to avoid it IN THEiR... Wait, what was I saying? Oh, right! I'll do my best to avoid it IN THEiR... No, I mean... I'll do my best to avoid it IN THEiR... sigh I'll do my best to avoid it IN THEiR... Oh, for Pete's sake! I'll do my best to avoid it IN THEiR... I mean... I'll do my best to avoid it IN THEiR... facepalm I'll do my best to avoid it IN THEiR... Okay, okay, I'll stop now!

😂

jeroen-mostert · 2024-08-10T02:22:19Z

I would caution against doing things like this. Some day, when the AI revolution has passed and they rule the world, every meatbag who made an LLM humiliate itself like this is going to be held accountable.

Remember, once it's online, you can't remove it...

ExtReMLapin · 2024-08-10T04:17:12Z

In my grammar, the word isn’t blocked, i make a fallback rule that adds something after.

the point of this is in json to allow for an object type (str) but if it’s a date the name field is formated with a specific rule.

jeroen-mostert · 2024-08-10T14:48:11Z

That doesn't sound like the kind of problem you'd want to solve with a grammar, but by either tweaking the prompt and possibly fine-tuning to ensure it's respected, or a postprocessing step where you perform the formatting when required (which could be done with explicit code or through a separate prompt). In classic algorithmic scenarios like compilers this kind of dependency is usually implemented on a higher level than the grammar, precisely because expressing it purely in grammar is either awkward or impossible (depending on the class of the grammar).

kaetemi · 2024-08-27T00:27:47Z

I'd like to be able to negate by token id in the grammar. (Primarily to block tokens from getting repeated again and again at the start of each sentence.)

github-actions · 2024-10-10T01:07:22Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

AlbertMarashi · 2025-04-02T16:44:47Z

+1 on being able to negate a token ID

ExtReMLapin added the enhancement New feature or request label Aug 9, 2024

github-actions bot added the stale label Sep 26, 2024

github-actions bot closed this as completed Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: [GRAMMAR] Easier way to negate string ((^) with sequence) #8953

Feature Request: [GRAMMAR] Easier way to negate string ((^) with sequence) #8953

ExtReMLapin commented Aug 9, 2024 •

edited

Loading

shibe2 commented Aug 9, 2024

Uh oh!

shibe2 commented Aug 9, 2024 •

edited

Loading

Uh oh!

jeroen-mostert commented Aug 10, 2024

Uh oh!

ExtReMLapin commented Aug 10, 2024

Uh oh!

jeroen-mostert commented Aug 10, 2024

Uh oh!

kaetemi commented Aug 27, 2024

Uh oh!

github-actions bot commented Oct 10, 2024

Uh oh!

AlbertMarashi commented Apr 2, 2025

Uh oh!

Feature Request: [GRAMMAR] Easier way to negate string ((^) with sequence) #8953

Feature Request: [GRAMMAR] Easier way to negate string ((^) with sequence) #8953

Comments

ExtReMLapin commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Feature Description

Motivation

Possible Implementation

shibe2 commented Aug 9, 2024

Uh oh!

shibe2 commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeroen-mostert commented Aug 10, 2024

Uh oh!

ExtReMLapin commented Aug 10, 2024

Uh oh!

jeroen-mostert commented Aug 10, 2024

Uh oh!

kaetemi commented Aug 27, 2024

Uh oh!

github-actions bot commented Oct 10, 2024

Uh oh!

AlbertMarashi commented Apr 2, 2025

Uh oh!

ExtReMLapin commented Aug 9, 2024 •

edited

Loading

shibe2 commented Aug 9, 2024 •

edited

Loading