-
Notifications
You must be signed in to change notification settings - Fork 35
Fixes for use with Pandas 1.2.1 #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@frreiss there is still one more error, it has to do with TensorArray repr() with floating point values, and is similar to pandas-dev/pandas#38391. I need to get back on that for a fix, but I'll try for a workaround first. We can merge this first if ok with you. |
""" | ||
if isinstance(item, Span) and \ | ||
item.begin == Span.NULL_OFFSET_VALUE: | ||
return Span.NULL_OFFSET_VALUE in self._begins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because otherwise the default is to check all values for equality, and that will fail if there is a NULL value with Span.target_text=None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how Pandas defines the result of NA == NA
for other nullable types? Should that expression evaluate True
, False
, or NA
? It could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the current Pandas policy is "¯_(ツ)_/¯"; see the warning on https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good question. The test failure related to this was checking assert na_value in data_missing
which expects that na_value == na_value
to be True. Seems like it would be different for np.nan
though..
f"objects to a TokenSpanArray. Found an " | ||
f"object of type {type(s)}" | ||
) | ||
if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets a little awkward dealing with with a NULL value with target_text=None, but should be ok for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case will work better once TokenSpanArray
allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
""" | ||
if isinstance(item, Span) and \ | ||
item.begin == Span.NULL_OFFSET_VALUE: | ||
return Span.NULL_OFFSET_VALUE in self._begins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how Pandas defines the result of NA == NA
for other nullable types? Should that expression evaluate True
, False
, or NA
? It could go either way.
f"objects to a TokenSpanArray. Found an " | ||
f"object of type {type(s)}" | ||
) | ||
if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case will work better once TokenSpanArray
allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.
Going to go ahead and merge, will try to fix the TensorArray issue separately |
Fixed
_from_sequence
to handle np.nan, added__contains__
to check for null values with text set toNone
.Closes #168