Skip to content

Feature Request: Proper Llama 3.1 Support in llama.cpp #8650

@Vaibhavs10

Description

@Vaibhavs10

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Llama 3.1 was just released and it is a significant leg up from the previous series of models: https://huggingface.co/blog/llama31

Whilst the overall architecture is the same, it requires some modelling updates, primarily around RoPE scaling: https://github.com/huggingface/transformers/blob/bc2adb0112b6677b0dfb4105c74570a0f92183eb/src/transformers/modeling_rope_utils.py#L298

It'd be great to add support for those so that the generations are more coherent and make sense.

Motivation

Note: Without the modelling changes, the generation might look coherent, but they are far from great and the true-st potential of the model!

Possible Implementation

Here's the corresponding transformers implementation: https://github.com/huggingface/transformers/blob/bc2adb0112b6677b0dfb4105c74570a0f92183eb/src/transformers/modeling_rope_utils.py#L298

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions