Skip to content

llama : add example for speculative sampling #2030

Closed
@ggerganov

Description

@ggerganov

Speculative sampling is explained here: https://arxiv.org/abs/2302.01318

In more simple terms here:

For start, the "draft" model can be generated using the train-text-from-scratch example using the same vocab as LLaMA. Later, we can try to utilize better models.

We also assume that batching multiple tokens with the "main" model is significantly faster compared to processing the tokens one-by-one. This may not yet be the case, but it will be when we close ggml-org/ggml#293

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions