-
Couldn't load subscription status.
- Fork 35
Open
Labels
enhancementNew feature or requestNew feature or request
Description
conll_2003_to_dataframes() currently passes through the special -DOCSTART- token when importing the CoNLL file format. It would be better if the import code dropped this special token and the sentence boundary that follows it and did not include either of them in the reconstructed document.
Major subtasks
- Modify
conll_2003_to_dataframes()so that it drops the-DOCSTART-token and the blank line after it when importing a data set in CoNLL-2003 format. - Modify
conll_2003_output_to_dataframes()so that it also drops the first two lines of each document when importing model outputs - Update examples and tutorials to reflect this change. Where needed, subtract 11 from the offsets of any spans we computed with the previous version of
conll_2003_to_dataframes()
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request