Skip to content

Conversation

@craymichael
Copy link
Contributor

Summary:
Convert ids to tokens without ugly unicode characters (e.g., Ġ). See:
huggingface/transformers#4786 and
https://discuss.huggingface.co/t/bpe-tokenizers-and-spaces-before-words/475/2

This is the preferred function over tokenizer.convert_ids_to_tokens() for user-facing data.

Quote from links:
> Spaces are converted in a special character (the Ġ) in the tokenizer prior to
> BPE splitting mostly to avoid digesting spaces since the standard BPE algorithm
> used spaces in its process

Differential Revision: D62672912

Zach Carmichael and others added 2 commits September 13, 2024 16:47
Summary:
Some attributions returned by gradient-based methods still have a `grad_fn` from autograd (e.g. `LayerGradientXActivation`). This diff ensures that the autograd graph is freed between attribute calls within `LLMGradientAttribution` to eliminate this as a potential source of VRAM accumulation.

Also wrapped `model.generate` with a `no_grad` context to avoid unecessary memory usage.

Differential Revision: D62671994
Summary:
Convert ids to tokens without ugly unicode characters (e.g., Ġ). See:
 huggingface/transformers#4786 and
https://discuss.huggingface.co/t/bpe-tokenizers-and-spaces-before-words/475/2

This is the preferred function over tokenizer.convert_ids_to_tokens() for user-facing data.

Quote from links:
    > Spaces are converted in a special character (the Ġ) in the tokenizer prior to
    > BPE splitting mostly to avoid digesting spaces since the standard BPE algorithm
    > used spaces in its process

Differential Revision: D62672912
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62672912

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 6636f4d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants