Skip to content

Words being corrected ##ts [BUG] #30

@nicno90

Description

@nicno90

Describe the bug
Words tagged as incorrect are replaced with a word with hashtags.

To Reproduce

#Steps to reproduce the behavior:
>>> import spacy
>>> nlp = spacy.load('en_core_web_lg', disable=['tagger'])
>>> from contextualSpellCheck import ContextualSpellCheck
2020-10-14 10:24:16.775668: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
>>> merge_ents = nlp.create_pipe("merge_entities")
>>> nlp.add_pipe(merge_ents)
>>> spell_checker = ContextualSpellCheck(max_edit_dist=3)
>>> nlp.add_pipe(spell_checker)
>>> sent = 'Everyone has to help to fix the problems of society. There has to be more training, more opportunity to bridge the gap between the haves and the have nots.'
>>> doc = nlp(sent)
>>> correct = doc._.outcome_spellCheck
>>> correct
'Everyone has to help to fix the problems of society. There has to be more training, more opportunity to bridge the gap between the have and the have ##ts.'

Expected behavior
'Everyone has to help to fix the problems of society. There has to be more training, more opportunity to bridge the gap between the have and the have nots.'
or
'Everyone has to help to fix the problems of society. There has to be more training, more opportunity to bridge the gap between the have and the have not.'

Version:

  • contextualSpellCheck 0.3.0
  • Spacy: 2.3.2
  • transformers 3.3.1

Additional information
I checked the vocab.txt and there are words with ## in the word. I am wondering what the need for these are.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions