Fixes for use with Pandas 1.2.1 #171

BryanCutler · 2021-01-28T18:45:11Z

Fixed _from_sequence to handle np.nan, added __contains__ to check for null values with text set to None.

Closes #168

…values

BryanCutler · 2021-01-28T18:47:31Z

@frreiss there is still one more error, it has to do with TensorArray repr() with floating point values, and is similar to pandas-dev/pandas#38391. I need to get back on that for a fix, but I'll try for a workaround first. We can merge this first if ok with you.

BryanCutler · 2021-01-28T18:49:20Z

text_extensions_for_pandas/array/span.py

+        """
+        if isinstance(item, Span) and \
+                item.begin == Span.NULL_OFFSET_VALUE:
+            return Span.NULL_OFFSET_VALUE in self._begins


This is needed because otherwise the default is to check all values for equality, and that will fail if there is a NULL value with Span.target_text=None

Do you know how Pandas defines the result of NA == NA for other nullable types? Should that expression evaluate True, False, or NA? It could go either way.

Looks like the current Pandas policy is "¯_(ツ)_/¯"; see the warning on https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

Yeah, good question. The test failure related to this was checking assert na_value in data_missing which expects that na_value == na_value to be True. Seems like it would be different for np.nan though..

BryanCutler · 2021-01-28T18:52:44Z

text_extensions_for_pandas/array/token_span.py

+                        f"objects to a TokenSpanArray. Found an "
+                        f"object of type {type(s)}"
+                    )
+            if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None):


This gets a little awkward dealing with with a NULL value with target_text=None, but should be ok for now

This case will work better once TokenSpanArray allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.

frreiss

LGTM

frreiss · 2021-01-28T19:22:43Z

text_extensions_for_pandas/array/span.py

+        """
+        if isinstance(item, Span) and \
+                item.begin == Span.NULL_OFFSET_VALUE:
+            return Span.NULL_OFFSET_VALUE in self._begins


Do you know how Pandas defines the result of NA == NA for other nullable types? Should that expression evaluate True, False, or NA? It could go either way.

frreiss · 2021-01-28T19:33:21Z

text_extensions_for_pandas/array/token_span.py

+                        f"objects to a TokenSpanArray. Found an "
+                        f"object of type {type(s)}"
+                    )
+            if tokens is None and not (s.begin == TokenSpan.NULL_OFFSET_VALUE and s.tokens.target_text is None):


This case will work better once TokenSpanArray allows multiple target texts/tokenization pairs. Still some design work to be done on that front, methinks.

BryanCutler · 2021-01-28T22:55:08Z

Going to go ahead and merge, will try to fix the TensorArray issue separately

Fixed _from_sequence to handle nan, added contains to check for null …

1404d18

…values

BryanCutler requested a review from frreiss January 28, 2021 18:47

BryanCutler commented Jan 28, 2021

View reviewed changes

frreiss approved these changes Jan 28, 2021

View reviewed changes

BryanCutler merged commit 9b13e73 into CODAIT:master Jan 28, 2021

BryanCutler deleted the fixes-for-pandas-1.2.1 branch January 28, 2021 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes for use with Pandas 1.2.1 #171

Fixes for use with Pandas 1.2.1 #171

Uh oh!

BryanCutler commented Jan 28, 2021 •

edited

Loading

Uh oh!

BryanCutler commented Jan 28, 2021

Uh oh!

BryanCutler Jan 28, 2021

Uh oh!

frreiss Jan 28, 2021

Uh oh!

frreiss Jan 28, 2021

Uh oh!

BryanCutler Jan 28, 2021

Uh oh!

BryanCutler Jan 28, 2021

Uh oh!

frreiss Jan 28, 2021

Uh oh!

frreiss left a comment

Uh oh!

frreiss Jan 28, 2021

Uh oh!

frreiss Jan 28, 2021

Uh oh!

BryanCutler commented Jan 28, 2021

Uh oh!

Uh oh!

Fixes for use with Pandas 1.2.1 #171

Fixes for use with Pandas 1.2.1 #171

Uh oh!

Conversation

BryanCutler commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BryanCutler commented Jan 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frreiss left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BryanCutler commented Jan 28, 2021

Uh oh!

Uh oh!

BryanCutler commented Jan 28, 2021 •

edited

Loading