Lossless Large Language Model Acceleration via Self-Speculative Decoding #3435

KerfuffleV2 · 2023-10-02T06:20:26Z

KerfuffleV2
Oct 2, 2023
Collaborator

Basically this is like the existing speculative decoding stuff except it doesn't use a separate speculation model but instead runs only some of the main model's layers to generate the draft. The big advantage is the "draft" model's output will definitely be in sync with the main model and you don't need to load in a whole separate model: the existing model can be reused.

Unfortunately, they don't really include specific information about which layers to skip are optimal so that's something we'd have to find out ourselves. The first step to that might be extending the inference API to allow passing a list of the layers to run and an example that could run perplexity on various permutations of layers.

BarfingLemurs · 2023-10-10T05:41:24Z

BarfingLemurs
Oct 10, 2023

Unfortunately, they don't really include specific information about which layers to skip are optimal so that's something we'd have to find out ourselves.

:( :(

couldn't we just ask paper authors?
maybe some of them are on github?

1 reply

KerfuffleV2 Oct 10, 2023
Collaborator Author

Dumb as it sounds, that hadn't even occurred to me. :) I created an issue in their repo, we'll see what happens.

The need to perform our own research would still exist, but knowing one set of good/optimal results would still be extremely helpful.

Galunid · 2023-10-10T07:01:48Z

Galunid
Oct 10, 2023
Collaborator

Repo with code (not yet available): https://github.com/dilab-zju/self-speculative-decoding

0 replies

KerfuffleV2 · 2023-10-10T12:40:06Z

KerfuffleV2
Oct 10, 2023
Collaborator Author

I asked the authors and got a very helpful reply. See: #3565 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lossless Large Language Model Acceleration via Self-Speculative Decoding #3435

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Lossless Large Language Model Acceleration via Self-Speculative Decoding #3435

Uh oh!

KerfuffleV2 Oct 2, 2023 Collaborator

Replies: 3 comments · 1 reply

Uh oh!

BarfingLemurs Oct 10, 2023

Uh oh!

KerfuffleV2 Oct 10, 2023 Collaborator Author

Uh oh!

Galunid Oct 10, 2023 Collaborator

Uh oh!

KerfuffleV2 Oct 10, 2023 Collaborator Author

KerfuffleV2
Oct 2, 2023
Collaborator

Replies: 3 comments 1 reply

BarfingLemurs
Oct 10, 2023

KerfuffleV2 Oct 10, 2023
Collaborator Author

Galunid
Oct 10, 2023
Collaborator

KerfuffleV2
Oct 10, 2023
Collaborator Author