Skip to content

Conversation

joy-rosie
Copy link
Contributor

@joy-rosie joy-rosie commented Jul 30, 2019

@joy-rosie
Copy link
Contributor Author

Can someone help me; when I run git diff upstream/master -u -- "*.py" | flake8 --diff
I get: fatal: bad revision 'upstream/master'

@jbrockmendel
Copy link
Member

Needs a test, don’t worry too much about the git thing

@joy-rosie
Copy link
Contributor Author

could you point me to the file where the test would go for this?

@joy-rosie
Copy link
Contributor Author

I had a quick look, would it be okay to go in this function: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/dtypes/test_missing.py#L81

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic needs to be replicated in _isnew_old as well. tests are needed.

may want to move this to the last clause of the if/else. performance tests need to be run for this.

@jreback jreback added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jul 31, 2019
@joy-rosie
Copy link
Contributor Author

joy-rosie commented Jul 31, 2019

this logic needs to be replicated in _isnew_old as well. tests are needed.

may want to move this to the last clause of the if/else. performance tests need to be run for this.

I added it above this elif https://github.com/joy-rosie/pandas/blob/issue-27482-isnull/pandas/core/dtypes/missing.py#L138 because that is causing the error. I will add to _isna_old and also add tests, not sure how to add perfomance tests.

@joy-rosie
Copy link
Contributor Author

I have added the tests and the statement to _isna_old, please let me know how I can run the performance tests

@jbrockmendel
Copy link
Member

please let me know how I can run the performance tests

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#running-the-performance-test-suite

@joy-rosie
Copy link
Contributor Author

joy-rosie commented Aug 3, 2019

Hello, I ran the performance tests as shown in the link and this is the final output:

. before after ratio
[143bc34] [7aef72c]

  •  21.7±0.5ms       29.6±0.6ms     1.37  indexing.NumericSeriesIndexing.time_loc_scalar(<class 'pandas.core.indexes.numeric.UInt64Index'>, 'nonunique_monotonic_inc')
    
  •     409±5μs         498±10μs     1.22  indexing.NumericSeriesIndexing.time_ix_scalar(<class 'pandas.core.indexes.numeric.Int64Index'>, 'nonunique_monotonic_inc')
    
  • 12.4±0.09ms       15.0±0.1ms     1.21  index_object.SetOperations.time_operation('int', 'symmetric_difference')
    
  •  6.12±0.2ms       6.99±0.3ms     1.14  stat_ops.SeriesMultiIndexOps.time_op(0, 'median')
    
  • 1.67±0.06μs       1.87±0.1μs     1.12  index_cached_properties.IndexCache.time_values('PeriodIndex')
    
  •   332±0.6ns         369±10ns     1.11  timestamp.TimestampOps.time_tz_localize('US/Eastern')
    
  • 1.01±0.02μs      1.12±0.02μs     1.11  index_cached_properties.IndexCache.time_inferred_type('RangeIndex')
    
  • 2.14±0.05ms      2.37±0.04ms     1.11  reindex.LevelAlign.time_align_level
    
  •     329±2ns          362±9ns     1.10  timestamp.TimestampOps.time_tz_localize(tzutc())
    
  •   331±0.5ns          365±8ns     1.10  timestamp.TimestampOps.time_tz_localize(<UTC>)
    
  •  4.64±0.2μs      4.21±0.05μs     0.91  timedelta.TimedeltaConstructor.time_from_missing
    
  • 1.71±0.02μs      1.55±0.05μs     0.91  period.PeriodProperties.time_property('M', 'year')
    
  • 1.42±0.08ms      1.29±0.01ms     0.90  ctors.SeriesConstructors.time_series_constructor(<function list_of_lists>, False, 'int')
    
  •    567±50μs        511±0.9μs     0.90  groupby.GroupByMethods.time_dtype_as_field('float', 'head', 'transformation')
    
  •  1.04±0.4ms          936±3μs     0.90  groupby.GroupByMethods.time_dtype_as_group('int', 'cumsum', 'direct')
    
  • 1.68±0.01μs       1.51±0.1μs     0.90  period.PeriodProperties.time_property('min', 'second')
    
  • 1.48±0.08ms      1.34±0.02ms     0.90  ctors.SeriesConstructors.time_series_constructor(<function list_of_lists>, True, 'int')
    
  • 2.41±0.05ms      2.16±0.09ms     0.90  reindex.LevelAlign.time_reindex_level
    
  • 1.69±0.04μs      1.52±0.06μs     0.90  period.PeriodProperties.time_property('M', 'month')
    
  • 1.69±0.02μs      1.51±0.06μs     0.89  period.PeriodProperties.time_property('M', 'day')
    
  •    586±60μs        523±0.7μs     0.89  groupby.GroupByMethods.time_dtype_as_field('datetime', 'tail', 'transformation')
    
  • 1.69±0.01μs      1.49±0.06μs     0.88  period.PeriodProperties.time_property('min', 'day')
    
  •  1.43±0.2ms         1.24±0ms     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'transformation')
    
  •    609±90μs        528±0.9μs     0.87  groupby.GroupByMethods.time_dtype_as_group('int', 'quantile', 'transformation')
    
  •  1.46±0.2ms         1.24±0ms     0.85  groupby.GroupByMethods.time_dtype_as_group('int', 'rank', 'direct')
    
  •  7.26±0.1ms       6.01±0.3ms     0.83  algorithms.Duplicated.time_duplicated('first', 'int')
    
  •    348±80μs          287±1μs     0.83  groupby.GroupByMethods.time_dtype_as_field('float', 'last', 'transformation')
    
  •  1.69±0.3ms      1.36±0.01ms     0.81  groupby.GroupByMethods.time_dtype_as_field('int', 'value_counts', 'direct')
    
  •   90.4±20μs       65.4±0.1μs     0.72  algorithms.SortIntegerArray.time_argsort(1000)
    
  •   422±200μs        289±0.9μs     0.68  groupby.GroupByMethods.time_dtype_as_field('float', 'last', 'direct')
    
  •   887±300μs          553±1μs     0.62  groupby.GroupByMethods.time_dtype_as_field('object', 'head', 'transformation')
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

Machine information:
{
"arch": "x86_64",
"cpu": "Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz",
"machine": "josie-lenovo-ideapad",
"num_cpu": "4",
"os": "Linux 4.18.0-25-generic",
"ram": "3897484",
"version": 1
}

@joy-rosie
Copy link
Contributor Author

joy-rosie commented Aug 3, 2019

I feel like it is a bit random where the performance has gone down and where its gone up

@jbrockmendel
Copy link
Member

I feel like it is a bit random where the performance has gone down and where its gone up

Yah, this is a common issue with asv (especially in cases where there are no non-noise changes). The usual solution is to re-run it a couple of times and pick out the changes that are consistent across runs.

@jreback
Copy link
Contributor

jreback commented Aug 4, 2019

this also needs a whatsnew note, put in 1.0 in Missing bug fix section

@joy-rosie
Copy link
Contributor Author

sorry to be a pain but where do i add a whatsnew note?

I am also finding it difficult to get any reliable results for the performance tests, I have ran it a few times and there are different results each time

Is there anything i can do there?

@jbrockmendel
Copy link
Member

sorry to be a pain but where do i add a whatsnew note?

Don't worry about it, we were all new here at one point. whatsnew note goes in doc/source/whatsnew/0.25.1.rst

I am also finding it difficult to get any reliable results for the performance tests, I have ran it a few times and there are different results each time

That usually indicates that there isn't any real impact. The non-asv alternative is to use Ipython %timeit directly on the affected function(s)

@joy-rosie
Copy link
Contributor Author

joy-rosie commented Aug 13, 2019

That usually indicates that there isn't any real impact. The non-asv alternative is to use Ipython %timeit directly on the affected function(s)
for benchmarking I am running the test function:

import pandas as pd
from pandas.util import testing as tm
isna_f = pd.isna
pd.__version__
%%timeit -n 10000

assert not isna_f(1.0)
assert isna_f(None)
assert isna_f(pd.np.nan)
assert not isna_f(pd.np.inf)
assert not isna_f(-pd.np.inf)

# # type
# assert not isna_f(type(pd.Series()))
# assert not isna_f(type(pd.DataFrame()))

# series
for s in [
    tm.makeFloatSeries(),
    tm.makeStringSeries(),
    tm.makeObjectSeries(),
    tm.makeTimeSeries(),
    tm.makePeriodSeries(),
]:
    assert isinstance(isna_f(s), pd.Series)

# frame
for df in [
    tm.makeTimeDataFrame(),
    tm.makePeriodFrame(),
    tm.makeMixedDataFrame(),
]:
    result = isna_f(df)
    expected = df.apply(isna_f)
    tm.assert_frame_equal(result, expected)

Outputs:

old:

'0.25.0'
33 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 10000 loops each)

new:

'0.25.0+71.g7aef72ced'
33 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please let me know if you think I should run the benchmark on anything else

Don't worry about it, we were all new here at one point. whatsnew note goes in doc/source/whatsnew/0.25.1.rst

Will this go into bugfixes or missing?

@jbrockmendel
Copy link
Member

Will this go into bugfixes or missing?

Yes. Probably in the "Missing" section

@joy-rosie
Copy link
Contributor Author

I added into whatsnew and it seems like there are some things failing but I am not sure why a change in whatsnew could cause these particular failures?

@TomAugspurger
Copy link
Contributor

Restarted the build. That test is randomly failing and we haven't diagnosed the cause yet.

@joy-rosie
Copy link
Contributor Author

Is this ready to go now? Let me know if there are any remaining bits.

@jbrockmendel
Copy link
Member

LGTM

@TomAugspurger TomAugspurger merged commit e118b1d into pandas-dev:master Aug 20, 2019
@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Aug 20, 2019
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Aug 20, 2019
…s-dev#27664)

* added a check for if obj is instance of type in _isna-new
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Aug 20, 2019
…e of type in _isna-new (pandas-dev#27664)

* added a check for if obj is instance of type in _isna-new
TomAugspurger added a commit that referenced this pull request Aug 20, 2019
… in _isna-new (#27664) (#28041)

* added a check for if obj is instance of type in _isna-new
galuhsahid pushed a commit to galuhsahid/pandas that referenced this pull request Aug 25, 2019
…s-dev#27664)

* added a check for if obj is instance of type in _isna-new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.isnull() raises AttributeError on Pandas type objects
4 participants