Skip to content

Space start validator crashes on empty completion #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EliahKagan opened this issue Jul 12, 2023 · 1 comment
Closed

Space start validator crashes on empty completion #535

EliahKagan opened this issue Jul 12, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@EliahKagan
Copy link

EliahKagan commented Jul 12, 2023

Describe the bug

When running openai tools fine_tunes.prepare_data, a completion that is empty, or that becomes empty after a remediation is applied, causes openai.validators.completions_space_start_validator to raise an unhandled IndexError. @tuliren has contributed a fix in #406. I'm opening this issue to give information about the bug and its impact. Merging #406 would fix this bug.

#125 added the test that finds this bug, test_long_examples_validator. The bug is in the code under test (not in the test itself). The bug precedes that pull request.

Since #153, numpy and pandas are not installed by default, and the test is skipped in their absence. Installing with pip install -e '.[dev]' and running pytest does not reveal the bug for this reason, whereas pip install -e '.[dev,datalib]' does show the failure.

I haven't found a commit in the history of the main branch in which the test has been present but able to pass. It looks like it may have originally been run with modified data tailored to the problem in #121 (which #125 fixes), and then stopped failing for many developers once #153 was merged, due to being skipped.

When actually using openai tools fine_tunes.prepare_data, I think it is unusual to include an empty completion or one that would become empty due to a remediation, so I think the user impact of this bug is low. But it seems to me that the ongoing situation where not all tests are able to pass may be confusing to new contributors. (I personally find it confusing.) Fortunately, the fix in #406 is straightforward.

To Reproduce

  1. On GNU/Linux or macOS, prepare an environment using Python 3.7, 3.8, 3.9, 3.10, or 3.11. (Although I've manually verified that the bug affects Windows, and test_long_examples_validator does not pass on Windows, other issues obscure this one.) I tested this locally on Ubuntu 22.04 with all those versions of Python, and on CI with ubuntu-latest and macos-latest using a workflow derived from tuliren's.

  2. In that environment, create an editable install, including at least the datalib extra, by running:
    pip install '.[dev,datalib]'

  3. Run: pytest
    Or, for the whole traceback, run: pytest -vv
    Or, for the whole traceback and just that one test, run: pytest -vvk test_long_examples_validator

Code snippets

x[0] raises IndexError when x is empty in:

lambda x: ("" if x[0] == " " else " ") + x

This appears in the add_space_start local function, in completions_space_start_validator, in openai/validators.py.

OS

All (tested mainly in Ubuntu 22.04.2 and macOS 12.6.7).

Python version

Python 3.7.17, 3.8.17, 3.9.17, 3.10.12, and 3.11.4.

Library version

Tested on b82a3f7 (main branch), though a range of versions are affected.

@rattrayalex
Copy link
Collaborator

Thanks for the detailed report. The fix proposed in #406 should be applied sometime tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants