Skip to content

Conversation

@BryanCutler
Copy link
Member

@BryanCutler BryanCutler commented Jan 28, 2021

Fixed _from_sequence to handle np.nan, added __contains__ to check for null values with text set to None.

Closes #168

@BryanCutler
Copy link
Member Author

@frreiss there is still one more error, it has to do with TensorArray repr() with floating point values, and is similar to pandas-dev/pandas#38391. I need to get back on that for a fix, but I'll try for a workaround first. We can merge this first if ok with you.

@BryanCutler BryanCutler requested a review from frreiss January 28, 2021 18:47
"""
if isinstance(item, Span) and \
item.begin == Span.NULL_OFFSET_VALUE:
return Span.NULL_OFFSET_VALUE in self._begins
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because otherwise the default is to check all values for equality, and that will fail if there is a NULL value with Span.target_text=None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how Pandas defines the result of NA == NA for other nullable types? Should that expression evaluate True, False, or NA? It could go either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the current Pandas policy is "¯_(ツ)_/¯"; see the warning on https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good question. The test failure related to this was checking assert na_value in data_missing which expects that na_value == na_value to be True. Seems like it would be different for np.nan though..

f"objects to a TokenSpanArray. Found an "
f"object of type {type(s)}"
)
if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets a little awkward dealing with with a NULL value with target_text=None, but should be ok for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case will work better once TokenSpanArray allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.

Copy link
Member

@frreiss frreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"""
if isinstance(item, Span) and \
item.begin == Span.NULL_OFFSET_VALUE:
return Span.NULL_OFFSET_VALUE in self._begins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how Pandas defines the result of NA == NA for other nullable types? Should that expression evaluate True, False, or NA? It could go either way.

f"objects to a TokenSpanArray. Found an "
f"object of type {type(s)}"
)
if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case will work better once TokenSpanArray allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.

@BryanCutler
Copy link
Member Author

Going to go ahead and merge, will try to fix the TensorArray issue separately

@BryanCutler BryanCutler merged commit 9b13e73 into CODAIT:master Jan 28, 2021
@BryanCutler BryanCutler deleted the fixes-for-pandas-1.2.1 branch January 28, 2021 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Look into test failures with Pandas 1.2.1

2 participants