Pretty-print tokens in `llm_attr` methods #1348

craymichael · 2024-09-14T03:58:59Z

Summary:
Convert ids to tokens without ugly unicode characters (e.g., Ġ). See:
huggingface/transformers#4786 and
https://discuss.huggingface.co/t/bpe-tokenizers-and-spaces-before-words/475/2

This is the preferred function over tokenizer.convert_ids_to_tokens() for user-facing data.

Quote from links:
> Spaces are converted in a special character (the Ġ) in the tokenizer prior to
> BPE splitting mostly to avoid digesting spaces since the standard BPE algorithm
> used spaces in its process

Differential Revision: D62672912

Summary: Some attributions returned by gradient-based methods still have a `grad_fn` from autograd (e.g. `LayerGradientXActivation`). This diff ensures that the autograd graph is freed between attribute calls within `LLMGradientAttribution` to eliminate this as a potential source of VRAM accumulation. Also wrapped `model.generate` with a `no_grad` context to avoid unecessary memory usage. Differential Revision: D62671994

Summary: Convert ids to tokens without ugly unicode characters (e.g., Ġ). See: huggingface/transformers#4786 and https://discuss.huggingface.co/t/bpe-tokenizers-and-spaces-before-words/475/2 This is the preferred function over tokenizer.convert_ids_to_tokens() for user-facing data. Quote from links: > Spaces are converted in a special character (the Ġ) in the tokenizer prior to > BPE splitting mostly to avoid digesting spaces since the standard BPE algorithm > used spaces in its process Differential Revision: D62672912

facebook-github-bot · 2024-09-14T03:59:23Z

This pull request was exported from Phabricator. Differential Revision: D62672912

facebook-github-bot · 2024-09-16T18:16:54Z

This pull request has been merged in 6636f4d.

Zach Carmichael and others added 2 commits September 13, 2024 16:47

facebook-github-bot added the cla signed label Sep 14, 2024

facebook-github-bot added the fb-exported label Sep 14, 2024

facebook-github-bot closed this in 6636f4d Sep 16, 2024

facebook-github-bot added the Merged label Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pretty-print tokens in `llm_attr` methods #1348

Pretty-print tokens in `llm_attr` methods #1348

Uh oh!

craymichael commented Sep 14, 2024

Uh oh!

facebook-github-bot commented Sep 14, 2024

Uh oh!

facebook-github-bot commented Sep 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pretty-print tokens in llm_attr methods #1348

Pretty-print tokens in llm_attr methods #1348

Uh oh!

Conversation

craymichael commented Sep 14, 2024

Uh oh!

facebook-github-bot commented Sep 14, 2024

Uh oh!

facebook-github-bot commented Sep 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pretty-print tokens in `llm_attr` methods #1348

Pretty-print tokens in `llm_attr` methods #1348