You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve error message in case of a misformatted file (#158)
* add more descriptive error handling regarding poorly formatted files
* update version
* add dot prefix to json file extentions and ensure list of allowable file types is complete
* cleanup error messages and add comments to explain jsonl/json loading logic
* cleanup csv/tsv reading allowing use of elif for other file extensions, add comments, and remove unnecessary re-attempt to parse as json
* run fillna immediately upon DataFrame creation so that an additional switch is not needed
* use only 1 try-except block to catch parsing errors + cleanup error message
* separate the json and jsonl cases while still maintaining the same functionality, also include a message to user if jsonl appears to be json or vice versa
* fix bug in csv path
* use index -1 to get extension from split
* black formatting apply
* fix black
Co-authored-by: joe-at-openai <[email protected]>
immediate_msg="\n- Based on your file extension, your file is formatted as an Excel file"
494
-
necessary_msg="Your format `XLSX` will be converted to `JSONL`"
495
-
xls=pd.ExcelFile(fname)
496
-
sheets=xls.sheet_names
497
-
iflen(sheets) >1:
498
-
immediate_msg+="\n- Your Excel file contains more than one sheet. Please either save as csv or ensure all data is present in the first sheet. WARNING: Reading only the first sheet..."
499
-
df=pd.read_excel(fname, dtype=str)
500
-
iffname.lower().endswith(".txt"):
501
-
immediate_msg="\n- Based on your file extension, you provided a text file"
502
-
necessary_msg="Your format `TXT` will be converted to `JSONL`"
503
-
withopen(fname, "r") asf:
504
-
content=f.read()
505
-
df=pd.DataFrame(
506
-
[["", line] forlineincontent.split("\n")],
507
-
columns=fields,
508
-
dtype=str,
490
+
immediate_msg=f"\n- Based on your file extension, your file is formatted as a {file_extension_str} file"
491
+
necessary_msg= (
492
+
f"Your format `{file_extension_str}` will be converted to `JSONL`"
immediate_msg="\n- Based on your file extension, your file is formatted as an Excel file"
497
+
necessary_msg="Your format `XLSX` will be converted to `JSONL`"
498
+
xls=pd.ExcelFile(fname)
499
+
sheets=xls.sheet_names
500
+
iflen(sheets) >1:
501
+
immediate_msg+="\n- Your Excel file contains more than one sheet. Please either save as csv or ensure all data is present in the first sheet. WARNING: Reading only the first sheet..."
502
+
df=pd.read_excel(fname, dtype=str).fillna("")
503
+
eliffname.lower().endswith(".txt"):
504
+
immediate_msg= (
505
+
"\n- Based on your file extension, you provided a text file"
525
506
)
507
+
necessary_msg="Your format `TXT` will be converted to `JSONL`"
immediate_msg="\n- Your JSON file appears to be in a JSONL format. Your file will be converted to JSONL format"
532
+
necessary_msg="Your format `JSON` will be converted to `JSONL`"
526
533
else:
527
-
error_msg+=f" Your file `{fname}` does not appear to have a file ending. Please ensure your filename ends with one of the supported file endings."
528
-
else:
529
-
df.fillna("", inplace=True)
534
+
error_msg="Your file must have one of the following extensions: .CSV, .TSV, .XLSX, .TXT, .JSON or .JSONL"
535
+
if"."infname:
536
+
error_msg+=f" Your file `{fname}` ends with the extension `.{fname.split('.')[-1]}` which is not supported."
537
+
else:
538
+
error_msg+=f" Your file `{fname}` is missing a file extension."
539
+
540
+
except (ValueError, TypeError):
541
+
file_extension_str=fname.split(".")[-1].upper()
542
+
error_msg=f"Your file `{fname}` does not appear to be in valid {file_extension_str} format. Please ensure your file is formatted as a valid {file_extension_str} file."
0 commit comments