Skip to content

Conversation

@ranocha
Copy link
Member

@ranocha ranocha commented Aug 24, 2021

@jlchan This should help us to get some real science done faster.

Running trixi_include("examples/dgmulti_2d/elixir_euler_rayleigh_taylor_instability.jl", tspan=(0.0, 0.5))
(second time, after compilation) with julia --check-bounds=no --threads=1 yields the following results.

  • Current main:
    ────────────────────────────────────────────────────────────────────────────────────────────────────
    Trixi simulation finished.  Final time: 0.5  Time steps: 254 (accepted), 256 (total)
    ────────────────────────────────────────────────────────────────────────────────────────────────────
    
    ───────────────────────────────────────────────────────────────────────────────────
                  Trixi.jl                      Time                   Allocations
                                        ──────────────────────   ───────────────────────
              Tot / % measured:              37.1s / 91.9%           6.93GiB / 98.9%
    
    Section                    ncalls     time   %tot     avg     alloc   %tot      avg
    ───────────────────────────────────────────────────────────────────────────────────
    rhs!                        2.31k    33.9s  99.4%  14.7ms   6.83GiB  100%   3.03MiB
      entropy_projection!       2.31k    15.2s  44.6%  6.58ms     0.00B  0.00%    0.00B
      calc_volume_integral!     2.31k    14.3s  42.0%  6.20ms   4.12GiB  60.0%  1.83MiB
      calc_sources!             2.31k    2.69s  7.89%  1.17ms   2.71GiB  39.5%  1.20MiB
      calc_interface_flux!      2.31k    1.24s  3.63%   537μs     0.00B  0.00%    0.00B
      calc_surface_integral!    2.31k    254ms  0.75%   110μs     0.00B  0.00%    0.00B
      invert_jacobian           2.31k   76.8ms  0.23%  33.3μs     0.00B  0.00%    0.00B
      ~rhs!~                    2.31k   66.4ms  0.19%  28.8μs   3.07MiB  0.04%  1.36KiB
      Reset du/dt               2.31k   34.7ms  0.10%  15.0μs     0.00B  0.00%    0.00B
      calc_boundary_flux!       2.31k   15.3ms  0.04%  6.65μs     0.00B  0.00%    0.00B
    analyze solution                4    219ms  0.64%  54.8ms   27.7MiB  0.39%  6.93MiB
    ───────────────────────────────────────────────────────────────────────────────────
    
  • This PR:
    ────────────────────────────────────────────────────────────────────────────────────────────────────
    Trixi simulation finished.  Final time: 0.5  Time steps: 254 (accepted), 256 (total)
    ────────────────────────────────────────────────────────────────────────────────────────────────────
    
    ───────────────────────────────────────────────────────────────────────────────────
                  Trixi.jl                      Time                   Allocations
                                        ──────────────────────   ───────────────────────
              Tot / % measured:              20.1s / 89.7%           6.79GiB / 98.9%
    
    Section                    ncalls     time   %tot     avg     alloc   %tot      avg
    ───────────────────────────────────────────────────────────────────────────────────
    rhs!                        2.31k    17.9s  99.1%  7.76ms   6.69GiB  100%   2.97MiB
      calc_volume_integral!     2.31k    13.1s  72.4%  5.67ms   4.12GiB  61.3%  1.83MiB
      calc_sources!             2.31k    2.14s  11.8%   927μs   2.57GiB  38.3%  1.14MiB
      entropy_projection!       2.31k    1.25s  6.92%   542μs     0.00B  0.00%    0.00B
      calc_interface_flux!      2.31k    1.12s  6.22%   487μs     0.00B  0.00%    0.00B
      calc_surface_integral!    2.31k    169ms  0.93%  73.1μs     0.00B  0.00%    0.00B
      invert_jacobian           2.31k   60.3ms  0.33%  26.1μs     0.00B  0.00%    0.00B
      Reset du/dt               2.31k   33.6ms  0.19%  14.6μs     0.00B  0.00%    0.00B
      ~rhs!~                    2.31k   31.2ms  0.17%  13.5μs   2.75MiB  0.04%  1.22KiB
      calc_boundary_flux!       2.31k   12.5ms  0.07%  5.44μs     0.00B  0.00%    0.00B
    analyze solution                4    171ms  0.95%  42.7ms   27.5MiB  0.40%  6.89MiB
    ───────────────────────────────────────────────────────────────────────────────────
    
  • This PR with my performance fixes for StructArrays.jl:
    ────────────────────────────────────────────────────────────────────────────────────────────────────
    Trixi simulation finished.  Final time: 0.5  Time steps: 254 (accepted), 256 (total)
    ────────────────────────────────────────────────────────────────────────────────────────────────────
    
    ───────────────────────────────────────────────────────────────────────────────────
                  Trixi.jl                      Time                   Allocations
                                        ──────────────────────   ───────────────────────
              Tot / % measured:              14.0s / 96.0%           11.1MiB / 75.7%
    
    Section                    ncalls     time   %tot     avg     alloc   %tot      avg
    ───────────────────────────────────────────────────────────────────────────────────
    rhs!                        2.31k    13.3s  99.0%  5.77ms   2.75MiB  32.8%  1.22KiB
      calc_volume_integral!     2.31k    10.3s  76.8%  4.47ms     0.00B  0.00%    0.00B
      entropy_projection!       2.31k    1.24s  9.23%   538μs     0.00B  0.00%    0.00B
      calc_interface_flux!      2.31k    1.15s  8.55%   498μs     0.00B  0.00%    0.00B
      calc_sources!             2.31k    320ms  2.38%   139μs     0.00B  0.00%    0.00B
      calc_surface_integral!    2.31k    149ms  1.11%  64.5μs     0.00B  0.00%    0.00B
      invert_jacobian           2.31k   58.7ms  0.44%  25.5μs     0.00B  0.00%    0.00B
      ~rhs!~                    2.31k   30.1ms  0.22%  13.1μs   2.75MiB  32.8%  1.22KiB
      Reset du/dt               2.31k   28.7ms  0.21%  12.4μs     0.00B  0.00%    0.00B
      calc_boundary_flux!       2.31k   11.7ms  0.09%  5.07μs     0.00B  0.00%    0.00B
    analyze solution                4    129ms  0.96%  32.1ms   5.64MiB  67.2%  1.41MiB
    ───────────────────────────────────────────────────────────────────────────────────
    

CC @sloede for approving the general strategy of performance optimizations specific to combinations of the solver/equations.

@ranocha ranocha requested a review from jlchan August 24, 2021 13:43
@sloede
Copy link
Member

sloede commented Aug 24, 2021

CC @sloede for approving the general strategy of performance optimizations specific to combinations of the solver/equations.

Personally, I think the code is rather difficult to understand and hard to maintain by anyone else but the initiated, but I do see the benefits of the performance improvements. As long as this is not getting out of hand and spreading to the rest of Trixi, I'd say go for it 🤷 😎

@ranocha
Copy link
Member Author

ranocha commented Aug 24, 2021

My second PR to StructArrays.jl speeds up weak forms stuff such as trixi_include("examples/dgmulti_2d/elixir_euler_triangulate_pkg_mesh.jl") by ca. one third on top of the other performance improvements mentioned above.

Copy link
Contributor

@jlchan jlchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! This is very cool, and gives me some ideas for my next PR. It's also a nice example of "extra" specialization that can be done on top of Trixi.

@ranocha ranocha requested a review from jlchan August 24, 2021 17:15
@jlchan
Copy link
Contributor

jlchan commented Aug 24, 2021

One side note: I think this would reduce runtimes the most for DGMultiFluxDiff{<:Polynomial, Quad} solvers since entropy_projection! is the dominant cost (for Tri elements, hadamard_sum! is the dominant cost). However, the upcoming Gauss collocation scheme that I'm implementing aims to reduce these costs instead by exploiting extra structure in entropy_projection!. It might be worth revisiting this specialization when I get around to that PR.

@ranocha
Copy link
Member Author

ranocha commented Aug 24, 2021

Thanks for your nice review, @jlchan!

This is very cool, and gives me some ideas for my next PR.

Just don't push the specializations too hard or @sloede will kill us (and Trixi.jl will become incredibly unreadable).

It's also a nice example of "extra" specialization that can be done on top of Trixi.

Yeah - this is a kind of specialization that I prototyped in #429, where we were able to beat some MFEM variants for linear advection 🎉 It's a really nice feature of Julia

@ranocha ranocha enabled auto-merge (squash) August 24, 2021 17:19
@jlchan
Copy link
Contributor

jlchan commented Aug 24, 2021

Just don't push the specializations too hard or @sloede will kill us (and Trixi.jl will become incredibly unreadable).

Yeah - that's one thing I'm worried about. This addresses the fact that entropy_projection! is slow, but the solver that it makes the most impact for (e.g., DGMultiFluxDiff{<:Polynomial, Quad}) has some inefficiencies that can be eliminated. A change of basis from nodal bases at Lobatto nodes to Gauss nodes exposes structure in entropy_projection! that can be used to eliminate all the volume evaluations of entropy2cons and cons2entropy, which is mainly what @turbo addresses in this PR.

This PR is helpful to speed things up until we can introduce a DGMultiFluxDiff{<:GSBP, Quad} solver type. When that gets implemented, I'd suggest that we revisit flux_differencing_compressible_euler.jl and maybe move it to an elixir or decide what solver type it should be specialized for.

@sloede
Copy link
Member

sloede commented Aug 24, 2021

Just don't push the specializations too hard or @sloede will kill us (and Trixi.jl will become incredibly unreadable).

Yeah - that's one thing I'm worried about.

About what exactly - the readability issue or about getting killed? Please tell me it's the latter...

@ranocha
Copy link
Member Author

ranocha commented Aug 24, 2021

This PR is helpful to speed things up until we can introduce a DGMultiFluxDiff{<:GSBP, Quad} solver type. When that gets implemented, I'd suggest that we revisit flux_differencing_compressible_euler.jl and maybe move it to an elixir or decide what solver type it should be specialized for.

Sounds good to me. The problems I addressed in my PRs to StructArrays.jl will still be helpful either way

@ranocha ranocha added the performance We are greedy label Aug 24, 2021
@codecov
Copy link

codecov bot commented Aug 24, 2021

Codecov Report

Merging #822 (930a471) into main (0c9b19e) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #822      +/-   ##
==========================================
+ Coverage   92.73%   92.75%   +0.02%     
==========================================
  Files         187      188       +1     
  Lines       17357    17398      +41     
==========================================
+ Hits        16095    16136      +41     
  Misses       1262     1262              
Flag Coverage Δ
unittests 92.75% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/solvers/dgmulti/dg.jl 96.25% <100.00%> (ø)
src/solvers/dgmulti/flux_differencing.jl 97.93% <100.00%> (ø)
...rs/dgmulti/flux_differencing_compressible_euler.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c9b19e...930a471. Read the comment docs.

@ranocha ranocha merged commit 6491e0c into main Aug 24, 2021
@ranocha ranocha deleted the hr/DGMulti_performance branch August 24, 2021 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance We are greedy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants