Skip to content

CLN: ASV HDFStore benchmark #18641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 6, 2017

Conversation

mroeschke
Copy link
Member

  • Ran flake8 and replaced star imports

An HDFStore.info() benchmark was added in #16666, but the benchmark fails (even on master). I can create a new issue for this unless there's an obvious fix I am not seeing.

asv dev -b ^hdfstore_bench· Discovering benchmarks
· Running 18 total benchmarks (1 commits * 1 environments * 18 benchmarks)
[  0.00%] ·· Building for existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  0.00%] ·· Benchmarking existing-py_home_matt_anaconda_envs_pandas_dev_bin_python
[  5.56%] ··· Running hdfstore_bench.HDF5.time_query_store_table                                       22.3ms
[ 11.11%] ··· Running hdfstore_bench.HDF5.time_query_store_table_wide                                  33.8ms
[ 16.67%] ··· Running hdfstore_bench.HDF5.time_read_store                                              11.6ms
[ 22.22%] ··· Running hdfstore_bench.HDF5.time_read_store_mixed                                        24.3ms
[ 27.78%] ··· Running hdfstore_bench.HDF5.time_read_store_table                                        16.3ms
[ 33.33%] ··· Running hdfstore_bench.HDF5.time_read_store_table_mixed                                  34.6ms
[ 38.89%] ··· Running hdfstore_bench.HDF5.time_read_store_table_wide                                   38.3ms
[ 44.44%] ··· Running hdfstore_bench.HDF5.time_store_info                                              failed
[ 44.44%] ····· Traceback (most recent call last):
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/site-packages/asv/benchmark.py", line 818, in <module>
                    commands[mode](args)
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/site-packages/asv/benchmark.py", line 795, in main_run
                    result = benchmark.do_run()
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/site-packages/asv/benchmark.py", line 349, in do_run
                    return self.run(*self._current_params)
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/site-packages/asv/benchmark.py", line 424, in run
                    samples, number = self.benchmark_timing(timer, repeat, warmup_time, number=number)
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/site-packages/asv/benchmark.py", line 471, in benchmark_timing
                    timing = timer.timeit(number)
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/timeit.py", line 202, in timeit
                    timing = self.inner(it, self.timer)
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/timeit.py", line 100, in inner
                    _func()
                  File "/home/matt/Projects/pandas-mroeschke/asv_bench/benchmarks/hdfstore_bench.py", line 101, in time_store_info
                    self.store.info()
                  File "/home/matt/anaconda/envs/pandas_dev/lib/python2.7/site-packages/pandas/io/pytables.py", line 495, in __getattr__
                    (type(self).__name__, name))
                AttributeError: 'HDFStore' object has no attribute 'info'

[ 50.00%] ··· Running hdfstore_bench.HDF5.time_store_repr                                              46.1ms
[ 55.56%] ··· Running hdfstore_bench.HDF5.time_store_str                                               45.2ms
[ 61.11%] ··· Running hdfstore_bench.HDF5.time_write_store                                             12.5ms
[ 66.67%] ··· Running hdfstore_bench.HDF5.time_write_store_mixed                                       28.3ms
[ 72.22%] ··· Running hdfstore_bench.HDF5.time_write_store_table                                       41.2ms
[ 77.78%] ··· Running hdfstore_bench.HDF5.time_write_store_table_dc                                     329ms
[ 83.33%] ··· Running hdfstore_bench.HDF5.time_write_store_table_mixed                                 52.2ms
[ 88.89%] ··· Running hdfstore_bench.HDF5.time_write_store_table_wide                                   123ms
[ 94.44%] ··· Running hdfstore_bench.HDF5Panel.time_read_store_table_panel                             45.6ms
[100.00%] ··· Running hdfstore_bench.HDF5Panel.time_write_store_table_panel                            76.3ms

@codecov
Copy link

codecov bot commented Dec 5, 2017

Codecov Report

Merging #18641 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18641      +/-   ##
==========================================
- Coverage   91.59%   91.57%   -0.02%     
==========================================
  Files         153      153              
  Lines       51221    51221              
==========================================
- Hits        46917    46908       -9     
- Misses       4304     4313       +9
Flag Coverage Δ
#multiple 89.44% <ø> (ø) ⬆️
#single 40.67% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.81% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13f6267...db44794. Read the comment docs.

os.remove(f)
except:
pass
os.remove(self.f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not needed to keep the try .. except here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and I seem to remember another PR recently related to this 'remove' ?)

Copy link
Member

@gfyoung gfyoung Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and I seem to remember another PR recently related to this 'remove' ?)

@jorisvandenbossche : Yes, you are correct, as that was I who made the comment 😉 .

@mroeschke : revert this part of your changes to restore the try-except. Windows is notorious for surprising users on os.remove. No reason to crash benchmarks (instead of failing gracefully) for something like that.

Copy link
Member

@gfyoung gfyoung Dec 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche : What do you think about just moving that os.remove-wrapper function pandas_vb_common ? It's not really an instance method but rather a utility from which a lot of classes could benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or put it in a Benchmark base class to inherit from, both are ok to me (if we do the base class, we could also put the goal_time = 0.2 there instead of repeating it everywhere

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably -0 on abstracting goal_time into a base class. It might be a little nicer but adds some indirection to the benchmarks that I'm uncertain about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "adds some indirection to the benchmarks" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note we already have a base class for this _BenchTeardown (PR was pretty recent) in io_bench.py. So I would simply move this into a top-level class you can inherit from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with "adds some indirection to the benchmarks" ?

Just that it wouldn't be immediately clear what the goal time for the benchmarks would be.

@gfyoung gfyoung added Benchmark Performance (ASV) benchmarks Clean labels Dec 5, 2017
os.remove(f)
except:
pass
os.remove(self.f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note we already have a base class for this _BenchTeardown (PR was pretty recent) in io_bench.py. So I would simply move this into a top-level class you can inherit from.

@pep8speaks
Copy link

Hello @mroeschke! Thanks for updating the PR.

Line 212:80: E501 line too long (89 > 79 characters)

Line 34:9: E722 do not use bare except'
Line 35:69: W291 trailing whitespace

@mroeschke
Copy link
Member Author

Thanks for the guidance, all. Created a new base class, BaseIO, for hdfstore_bench.py and io_bench.py with a remove method for files. Will take care of the pep8speaks comments in future PRs.

@jorisvandenbossche jorisvandenbossche merged commit 0b3f24b into pandas-dev:master Dec 6, 2017
@jorisvandenbossche
Copy link
Member

Thanks!

@jorisvandenbossche jorisvandenbossche added this to the 0.22.0 milestone Dec 6, 2017
@mroeschke mroeschke deleted the asv_clean_hdfbench branch December 6, 2017 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks Clean
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants