Skip to content

Conversation

jquesnelle
Copy link

This fixes a subtle bug in the YaRN implementation. When calculating the linear ramp, we're attempting to replicate this code:

if min == max:
        max += 0.001  # Prevent singularity

    linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min)

So, when min == max, we want max - min = 0.001. The code currently calculates a particular entry of linear_func as

const float y = (i0 / 2 - low) / min(0.001f, high - low);

But, when high - low == 0, min(0.001, 0) = 0, not 0.001. The fix is to change the min to a max.

I've also added in the code to be able to set --yarn-orig-ctx from the command line, so that models such as TheBloke/Yarn-Llama-2-7B-64K-GGUF which were converted without the GGUF YaRN keys in them can still be used (if the correct values are passed on the command line).

@cebtenzzre cebtenzzre merged commit 14cf93b into cebtenzzre:ntkv2 Oct 20, 2023
cebtenzzre added a commit that referenced this pull request Nov 27, 2023
* vvhg-code-infill (#1)

* infill in separate example (#2)

* reverted changes to main and added infill example

* cleanup

* naming improvement

* make : add missing blank line

* fix missing semicolon

* brought infill up to current main code

* cleanup

---------

Co-authored-by: Cebtenzzre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants