Skip to content

Fixes for use with Pandas 1.2.1 #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2021

Conversation

BryanCutler
Copy link
Member

@BryanCutler BryanCutler commented Jan 28, 2021

Fixed _from_sequence to handle np.nan, added __contains__ to check for null values with text set to None.

Closes #168

@BryanCutler
Copy link
Member Author

@frreiss there is still one more error, it has to do with TensorArray repr() with floating point values, and is similar to pandas-dev/pandas#38391. I need to get back on that for a fix, but I'll try for a workaround first. We can merge this first if ok with you.

@BryanCutler BryanCutler requested a review from frreiss January 28, 2021 18:47
"""
if isinstance(item, Span) and \
item.begin == Span.NULL_OFFSET_VALUE:
return Span.NULL_OFFSET_VALUE in self._begins
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because otherwise the default is to check all values for equality, and that will fail if there is a NULL value with Span.target_text=None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how Pandas defines the result of NA == NA for other nullable types? Should that expression evaluate True, False, or NA? It could go either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the current Pandas policy is "¯_(ツ)_/¯"; see the warning on https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good question. The test failure related to this was checking assert na_value in data_missing which expects that na_value == na_value to be True. Seems like it would be different for np.nan though..

f"objects to a TokenSpanArray. Found an "
f"object of type {type(s)}"
)
if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets a little awkward dealing with with a NULL value with target_text=None, but should be ok for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case will work better once TokenSpanArray allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.

Copy link
Member

@frreiss frreiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"""
if isinstance(item, Span) and \
item.begin == Span.NULL_OFFSET_VALUE:
return Span.NULL_OFFSET_VALUE in self._begins
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how Pandas defines the result of NA == NA for other nullable types? Should that expression evaluate True, False, or NA? It could go either way.

f"objects to a TokenSpanArray. Found an "
f"object of type {type(s)}"
)
if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case will work better once TokenSpanArray allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.

@BryanCutler
Copy link
Member Author

Going to go ahead and merge, will try to fix the TensorArray issue separately

@BryanCutler BryanCutler merged commit 9b13e73 into CODAIT:master Jan 28, 2021
@BryanCutler BryanCutler deleted the fixes-for-pandas-1.2.1 branch January 28, 2021 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Look into test failures with Pandas 1.2.1
2 participants