Skip to content

Conversation

tehunter
Copy link
Contributor

@tehunter tehunter commented Jun 15, 2022

@pep8speaks
Copy link

pep8speaks commented Jun 15, 2022

Hello @tehunter! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-06-29 14:18:30 UTC

@tehunter
Copy link
Contributor Author

ASV Results:

..       before           after         ratio
     [6f0be79b]       [db365209]
                      <styler-performance>
-       915±200ms         594±30ms     0.65  io.excel.WriteExcel.time_write_excel('openpyxl')
-       658±200ms         323±20ms     0.49  io.excel.WriteExcel.time_write_excel('xlsxwriter')
-       3.08±0.1s         934±40ms     0.30  io.excel.WriteExcelStyled.time_write_excel_style('xlsxwriter')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@tehunter tehunter changed the title PERF: Improve Styler Performance PERF: Improve Styler to_excel Performance Jun 15, 2022
@tehunter
Copy link
Contributor Author

@github-actions pre-commit

@datapythonista datapythonista added Performance Memory or execution speed performance IO Excel read_excel, to_excel labels Jun 16, 2022
@tehunter tehunter marked this pull request as ready for review June 16, 2022 11:43
@tehunter
Copy link
Contributor Author

tehunter commented Jun 16, 2022

Overview of changes:

  • Moved CSSResolver.expand_* methods (which are accessed in .atomize) to a dictionary called CSS_EXPANSIONS which is keyed directly by the CSS property, instead of having to do a string replace and concatenation.
  • CSSResolver.__call__ now accepts an iterable of declaration tuples (property, value) in addition to the previous string type.
    • When an iterable is passed, .parse does not need to be called
  • CSSExcelCell.__init__ no longer converts the css_styles (i.e. Styler.ctx) from a list of tuples to a CSS string. Instead, the list is converted to a frozenset via a dictionary (to keep only the final instance of each property declaration)
    • The list of tuples was already obtained from the initial CSS string(s) in Styler._update_ctx, so we were previously parsing, reforming into string, and parsing again.
  • CSSToExcelConverter.__call__ is now cached using lru_cache. The frozenset was necessary to make all the arguments hashable for the caching mechanism.

@tehunter
Copy link
Contributor Author

Benchmarks as of 8e56402

.       before           after         ratio
     [2b1184dd]       [8e56402e]
                      <styler-performance>
-        260±10ms          233±5ms     0.90  io.excel.WriteExcel.time_write_excel('xlsxwriter')
-        620±70ms         450±20ms     0.73  io.excel.WriteExcel.time_write_excel('openpyxl')
-      2.51±0.02s        1.42±0.3s     0.57  io.excel.WriteExcelStyled.time_write_excel_style('openpyxl')
-      1.69±0.07s         608±10ms     0.36  io.excel.WriteExcelStyled.time_write_excel_style('xlsxwriter')
- ```

@mroeschke mroeschke added this to the 1.5 milestone Jun 29, 2022
@mroeschke mroeschke merged commit ad842d3 into pandas-dev:main Jun 29, 2022
@mroeschke
Copy link
Member

Awesome improvement! Thanks @tehunter

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* Move CSS expansion lookup to dictionary

* Implement simple CSSToExcelConverter cache

* Eliminate list -> str -> list in CSSResolver

* Allow for resolution of duplicate properties

* Add performance benchmark for styled Excel

* CLN: Clean up PEP8 issues

* DOC: Update PR documentation

* CLN: Clean up PEP8 issues

* Fixes from pre-commit [automated commit]

* Make Excel CSS case-insensitive

* Test for ordering and caching

* Pre-commit fixes

* Remove built-in filter

* Increase maxsize of Excel cache

Co-authored-by: Thomas Hunter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Excel Styler treatment of CSS side expansion is slow
4 participants