Skip to content

release gil more #29322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 4, 2019
Merged

release gil more #29322

merged 7 commits into from
Nov 4, 2019

Conversation

jbrockmendel
Copy link
Member

cc @mroeschke since a lot of this touches libwindow

@mroeschke
Copy link
Member

Thanks @jbrockmendel. I am doing some heavy refactoring in rolling.py and window.pyx right now, but it looks like your changes shouldn't conflict heavily with what I'm changing.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure makes sense. Minor things

return _roll_weighted_sum_mean(values, weights, minp, avg=1)


def _roll_weighted_sum_mean(float64_t[:] values, float64_t[:] weights,
int minp, bint avg):
cdef _roll_weighted_sum_mean(float64_t[:] values, float64_t[:] weights,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdef void (unless this returns something)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. does this affect perf or is it just more explicit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm coming from the explicit angle but I can't imagine it would hurt either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually looks like this should be cdef ndarray[float64_t], will update

for i in range(1, N):
cur = vals[i]
is_observation = (cur == cur)
nobs += <int64_t>(1 if is_observation else 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the if...else really required here? Wouldn't cast to int64 handle that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh i thought this was necessary, but now looks like it isnt. will change

if axis == 0:
if periods >= 0:
start, stop = periods, sx
with nogil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the nogil here not just precede the if...else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be nice, but no. The arr.flags.f_contiguous check is in python-space

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you assign to a bool before the block and check that at the top instead of staying in the Python space? On first impression I was expecting the difference branches to have different GIL considerations, hence the ask

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, will do

@WillAyd WillAyd added the Clean label Nov 1, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. Minor things to consider otherwise happy with merge

for i in range(1, N):
cur = vals[i]
is_observation = (cur == cur)
nobs += <int64_t>is_observation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cast is unnecessary; at least C should widen the bool type to int64 as part of the addition

cur_x = input_x[i]
cur_y = input_y[i]
is_observation = ((cur_x == cur_x) and (cur_y == cur_y))
nobs += <int64_t>is_observation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on cast


return output
# `.base` to access underlying ndarray
return output.base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point output is a memoryview and we need an ndarray. this is how you do that without creating a new object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC don't we ucommonly use np.asarray(output) ? I think this is more explict (and the same perf wise).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not entirely consistent about it, but i agree on the explicit thing. On the margin it must be slightly less performant because it calls into python-space, but that'll be in the "too small to measure" category

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i like it because its more explicit.

@jreback jreback added this to the 1.0 milestone Nov 2, 2019
@jreback jreback added the Performance Memory or execution speed performance label Nov 2, 2019
@jreback
Copy link
Contributor

jreback commented Nov 2, 2019

does this show any perf benefits on roll ing sum?

@jreback
Copy link
Contributor

jreback commented Nov 2, 2019

pls rebase

@jbrockmendel
Copy link
Member Author

rebased, changed output.base to np.asarray(output), will run some asvs

@jbrockmendel
Copy link
Member Author

Wow, way better than I had expected:

$ asv continuous -E virtualenv -f 1.1 master HEAD -b rolling
[...]
       before           after         ratio
     [023fa0cf]       [773c9f94]
     <master>         <lessgil2>
-        48.6±4ms       43.6±0.8ms     0.90  rolling.Quantile.time_quantile('DataFrame', 10, 'float', 0.5, 'linear')
-        69.6±5ms       61.8±0.9ms     0.89  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0.5, 'linear')
-        49.4±6ms       42.6±0.6ms     0.86  rolling.Methods.time_rolling('DataFrame', 10, 'float', 'median')
-      2.05±0.2ms      1.71±0.05ms     0.83  rolling.Methods.time_rolling('DataFrame', 1000, 'float', 'std')
-      4.78±0.3ms       3.91±0.2ms     0.82  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'float', 'max')
-        78.9±8ms       62.6±0.7ms     0.79  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0.5, 'midpoint')
-        54.0±4ms       42.8±0.9ms     0.79  rolling.Quantile.time_quantile('Series', 10, 'int', 0.5, 'linear')
-      3.25±0.2ms      2.54±0.09ms     0.78  rolling.VariableWindowMethods.time_rolling('Series', '1h', 'int', 'skew')
-      4.74±0.3ms       3.64±0.2ms     0.77  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'max')
-      2.21±0.2ms      1.69±0.02ms     0.76  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'std')
-     2.39±0.05ms      1.79±0.06ms     0.75  rolling.VariableWindowMethods.time_rolling('DataFrame', '1d', 'float', 'count')
-     2.49±0.02ms      1.85±0.07ms     0.75  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 1, 'higher')
-      4.18±0.2ms      3.10±0.09ms     0.74  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'std')
-      1.64±0.3ms      1.21±0.02ms     0.74  rolling.Quantile.time_quantile('DataFrame', 10, 'int', 0, 'higher')
-      3.60±0.2ms      2.65±0.07ms     0.74  rolling.Methods.time_rolling('Series', 1000, 'int', 'count')
-      4.82±0.1ms       3.53±0.1ms     0.73  rolling.VariableWindowMethods.time_rolling('DataFrame', '50s', 'float', 'min')
-      2.82±0.1ms      2.05±0.06ms     0.73  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'sum')
-      2.46±0.2ms      1.77±0.05ms     0.72  rolling.ExpandingMethods.time_expanding('DataFrame', 'float', 'min')
-      5.46±0.3ms       3.91±0.2ms     0.71  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'float', 'min')
-      2.79±0.1ms      1.96±0.04ms     0.70  rolling.Quantile.time_quantile('Series', 1000, 'float', 0, 'nearest')
-        19.1±2ms       13.2±0.2ms     0.69  rolling.Pairwise.time_pairwise(None, 'cov', True)
-     2.48±0.03ms      1.71±0.04ms     0.69  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 0, 'lower')
-     2.91±0.09ms      1.98±0.03ms     0.68  rolling.Quantile.time_quantile('Series', 1000, 'float', 0, 'midpoint')
-      3.83±0.2ms      2.60±0.07ms     0.68  rolling.Methods.time_rolling('Series', 10, 'float', 'count')
-      5.41±0.2ms      3.67±0.04ms     0.68  rolling.VariableWindowMethods.time_rolling('Series', '50s', 'int', 'max')
-      2.80±0.2ms      1.83±0.04ms     0.65  rolling.Quantile.time_quantile('Series', 10, 'float', 1, 'nearest')
-      3.85±0.2ms       2.51±0.2ms     0.65  rolling.VariableWindowMethods.time_rolling('DataFrame', '1h', 'int', 'skew')
-      2.71±0.1ms      1.74±0.04ms     0.64  rolling.Quantile.time_quantile('DataFrame', 1000, 'float', 1, 'lower')
-      5.99±0.2ms       2.19±0.4ms     0.37  rolling.EWMMethods.time_ewm('Series', 1000, 'int', 'mean')
-     5.69±0.09ms       2.07±0.4ms     0.36  rolling.EWMMethods.time_ewm('Series', 10, 'int', 'mean')
-      5.78±0.2ms      1.65±0.07ms     0.28  rolling.EWMMethods.time_ewm('Series', 10, 'float', 'mean')
-      5.90±0.6ms      1.67±0.03ms     0.28  rolling.EWMMethods.time_ewm('Series', 1000, 'float', 'mean')
-      5.49±0.1ms       1.42±0.2ms     0.26  rolling.EWMMethods.time_ewm('DataFrame', 1000, 'int', 'mean')
-      5.46±0.2ms      1.39±0.07ms     0.25  rolling.EWMMethods.time_ewm('DataFrame', 10, 'float', 'mean')
-      5.61±0.2ms      1.43±0.04ms     0.25  rolling.EWMMethods.time_ewm('DataFrame', 10, 'int', 'mean')
-      5.43±0.2ms      1.31±0.02ms     0.24  rolling.EWMMethods.time_ewm('DataFrame', 1000, 'float', 'mean')
-      13.2±0.2ms       2.87±0.5ms     0.22  rolling.EWMMethods.time_ewm('DataFrame', 10, 'int', 'std')
-      13.4±0.1ms       2.42±0.4ms     0.18  rolling.EWMMethods.time_ewm('Series', 10, 'int', 'std')
-      13.3±0.2ms      2.36±0.08ms     0.18  rolling.EWMMethods.time_ewm('DataFrame', 1000, 'int', 'std')
-      13.3±0.4ms      2.31±0.05ms     0.17  rolling.EWMMethods.time_ewm('DataFrame', 10, 'float', 'std')
-      13.4±0.5ms      2.28±0.03ms     0.17  rolling.EWMMethods.time_ewm('DataFrame', 1000, 'float', 'std')
-      13.2±0.3ms      2.22±0.05ms     0.17  rolling.EWMMethods.time_ewm('Series', 10, 'float', 'std')
-        15.7±2ms       2.32±0.1ms     0.15  rolling.EWMMethods.time_ewm('Series', 1000, 'int', 'std')
-        15.5±2ms       2.15±0.1ms     0.14  rolling.EWMMethods.time_ewm('Series', 1000, 'float', 'std')

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Whatsnew note for perf boost might be nice, especially for the expanding benchmarks. Otherwise over to @jreback

@jreback
Copy link
Contributor

jreback commented Nov 2, 2019

lgtm. nice perf boosts! what's up with the windows build?

@jbrockmendel
Copy link
Member Author

what's up with the windows build?

Looks unrelated:

Windows fatal exception: access violation
[...]
worker 'gw1' crashed while running 'pandas/tests/io/json/test_compression.py::test_to_json_compression[gzip-True-False]'

@jbrockmendel
Copy link
Member Author

green

@jreback jreback merged commit 7ba9eb6 into pandas-dev:master Nov 4, 2019
@jreback
Copy link
Contributor

jreback commented Nov 4, 2019

thanks, very nice

@jbrockmendel jbrockmendel deleted the lessgil2 branch November 4, 2019 15:01
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants