Skip to content

KeyError when removing long examples after removing duplicate rows #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
serinamarie opened this issue Sep 14, 2022 · 7 comments · Fixed by #125
Closed

KeyError when removing long examples after removing duplicate rows #121

serinamarie opened this issue Sep 14, 2022 · 7 comments · Fixed by #125
Labels
bug Something isn't working

Comments

@serinamarie
Copy link
Contributor

serinamarie commented Sep 14, 2022

Error:

openai tools fine_tunes.prepare_data -f training_data_2022-09-14.jsonl
Analyzing...

- Your file contains 2446 prompt-completion pairs
Based on the analysis we will perform the following actions:
- [Recommended] Remove 1155 duplicate rows [Y/n]: y
- [Recommended] Remove 49 long examples [Y/n]: y
Traceback (most recent call last):
  File "/Users/ser/project/project-venv/bin/openai", line 8, in <module>
    sys.exit(main())
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/_openai_scripts.py", line 63, in main
    args.func(args)
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/cli.py", line 531, in prepare_data
    apply_validators(
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/validators.py", line 851, in apply_validators
    df, optional_applied = apply_optional_remediation(
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/validators.py", line 578, in apply_optional_remediation
    df = remediation.optional_fn(df)
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/openai/validators.py", line 171, in optional_fn
    return x.drop(long_indexes)
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/frame.py", line 4957, in drop
    return super().drop(
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/generic.py", line 4267, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/generic.py", line 4311, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/Users/ser/project/project-venv/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6661, in drop
    raise KeyError(f"{list(labels[mask])} not found in axis")
KeyError: '[330, 352, 377, 378, 422, 424, 435, 1172, 1194, 1219, 1220, 1264, 1266, 1277, 1468, 1498, 1549, 1641, 1648, 1714, 1741, 1816, 1859, 1984] not found in axis'

I believe that since the duplicate rows were removed, many of the long examples are missing, throwing this error. And thus I end up needing to apply the first recommendation and not the second one, and then use the resulting file to apply the second recommendation.

It would be great to be able to apply both changes to the same file.

@hallacy
Copy link
Collaborator

hallacy commented Sep 24, 2022

Hi @serinamarie! Thanks for the issue!

Yep, that sure looks like what's happening. We should be able to take a look at this soon and dive into why this is happening (or if you want to take a shot at it, I certainly won't stop you)

@hallacy hallacy added the bug Something isn't working label Sep 24, 2022
@serinamarie
Copy link
Contributor Author

Sounds good, I believe I know what the issue is so I can get to it this weekend 👍

@serinamarie
Copy link
Contributor Author

Hi @hallacy, I have created a new branch locally with my changes but am unable to push them as I do not have write access. Would it be possible to get it? Thanks in advance.

@hallacy
Copy link
Collaborator

hallacy commented Sep 27, 2022

Right now the best way to submit a PR to us is to fork the repo, make your changes, and then open up a PR to merge your fork back in. It's a bit clunky but also seems to be the best way for us to maintain the right security controls over the repo unfortunately

@serinamarie
Copy link
Contributor Author

@hallacy Should I provide tests alongside the fix?

@hallacy
Copy link
Collaborator

hallacy commented Oct 14, 2022

I'm absolutely not going to say no to tests though I've already made you go through one round of edits. I can merge your PR in later today

@serinamarie
Copy link
Contributor Author

@hallacy Great, I've just added a test if you can review!

safa0 pushed a commit to safa0/openai-agents-python that referenced this issue Apr 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants