Skip to content

openai tools fine_tunes.prepare_data does not accept indented JSON files #206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reinvantveer opened this issue Jan 31, 2023 · 5 comments
Closed
Assignees

Comments

@reinvantveer
Copy link

To reproduce:

pip install --user openai[datalib]==0.26.4 

# Works fine:
echo '[{"prompt": "Here is my example input 1", "completion": "Complete to 1"}, {"prompt": "Here is my example input 2", "completion": "Complete to 2"}]' > unindented.json
openai tools fine_tunes.prepare_data --quiet --file unindented.json

# Doesn't work:
cat > to_indented.py << EOF
import json
with open('unindented.json', 'rt') as f:
    data = json.loads(f.read())
# Simple rewrite of the "unindented.json": output to indented version
with open('indented', 'wt') as f:
    f.write(json.dumps(data, indent=2))
EOF
python to_indented.py
openai tools fine_tunes.prepare_data -f indented.json
@reinvantveer
Copy link
Author

The main issue with this is (at the risk of being over-obvious) that I like my JSON files indented for readability. Thanks for making this available!

@reinvantveer
Copy link
Author

My first suspicion is this line: https://github.com/openai/openai-python/blob/main/openai/validators.py#L525
The (supposed) json file path is passed directly to pandas to read a dataframe from it, but it somehow fails. Since this is executed in a huge try/except block, you could try to either remove the try/except clauses and see what pandas makes of this and why it thinks it's an invalid json file.

Another way to approach this is to make a intermediate representation, by opening the file, reading the contents and passing it to json.loads before passing it on to pandas. But I'm not very familiar with pandas, so I'm not sure what pandas expects as input from a list of dicts.

@hallacy
Copy link
Collaborator

hallacy commented Apr 8, 2023

@BorisPower @joe-at-openai can y'all look at this? I suspect this broke with #190 for json parsing, specifically on df = pd.read_json(fname, lines=True, dtype=str).fillna("") since .json files would span multiple lines

@joe-at-openai
Copy link
Collaborator

@hallacy Yeah, no problem!

@hallacy
Copy link
Collaborator

hallacy commented Apr 10, 2023

#389 should fix this

@hallacy hallacy closed this as completed Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants