Skip to content

Conversation

ChrisRackauckas
Copy link
Member

No description provided.

@ChrisRackauckas
Copy link
Member Author

@oscardssmith did you check the performance with your static changes? It seems you introduced a regression.

@oscardssmith
Copy link
Member

I tested performance for Hires with QNDF and it looked about even. The thing that's a bit tricky is that StaticW is purely a tradeoff with StaticLU that depends on how many times the jacobian is reused. StaticW requires taking an inv which is ~400ns for an 8x8 on my laptop, but the operator application is ~10ns, compared to the LU approach which is ~100 ns for the factorization and ~50ns for the operator. As such, the StaticW is worth it if your W is reused for at least 8 factorizations which will be solver and problem dependent (and it will also depend on the cost of the operations which will vary between computers).

@ChrisRackauckas
Copy link
Member Author

StaticW requires taking an inv which is ~400ns for an 8x8 on my laptop

No it's dependent on size.

https://github.com/SciML/OrdinaryDiffEq.jl/blob/master/src/derivative_utils.jl#L1-L11

We can tweak it if the timings have changed. See

SciML/OrdinaryDiffEq.jl#1539

@ChrisRackauckas
Copy link
Member Author

I'm mostly worried about Rosenbrock, since that's the case where small and static makes the most sense. That has a clear regression given these results.

@oscardssmith
Copy link
Member

Wait, that's worse. That means for bigger matrices we are re-doing the factorization each time. Shouldn't StaticW have a second field or something for the LU factorization?

@ChrisRackauckas
Copy link
Member Author

What should that cutoff be? Do you have timings?

@oscardssmith
Copy link
Member

using StaticArrays, LinearAlgebra
for i in 2:12
     J = SMatrix{i,i}(rand(i,i))
    u = SVector{i}(rand(i))
    binv = @belapsed inv($J)
    bmul = @belapsed $J*$u
    bfactor = @belapsed lu($J)
    F = lu(J)
    bsolve = @belapsed $F\$u
    cross = (binv-bfactor)/(bsolve-bfactor)
    @show i, cross, binv, bmul, bfactor, bsolve
end

gives

(i, cross, binv, bmul, bfactor, bsolve) = (2, -3.655047800519071, 2.8e-9, 2.8e-9, 1.3947895791583166e-8, 5.8499999999999994e-9)
(i, cross, binv, bmul, bfactor, bsolve) = (3, -2.2877713304175917, 7.317317317317318e-9, 2.8e-9, 3.188128772635815e-8, 1.3537074148296594e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (4, -2.183671904669555, 1.8385155466399196e-8, 2.8e-9, 6.206122448979592e-8, 2.2801204819277106e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (5, 4.157155895876654, 2.1063148148148148e-7, 3.19e-9, 9.951271186440678e-8, 2.991951710261569e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (6, 5.630374305333124, 3.4389671361502347e-7, 3.53e-9, 1.5024907063197025e-7, 3.792338709677419e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (7, 5.909648317396486, 4.587309644670051e-7, 4.91e-9, 2.0907272727272727e-7, 4.715587044534413e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (8, 6.647505668931906, 6.560573248407643e-7, 4.03e-9, 3.0015873015873017e-7, 5.756866734486267e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (9, 7.021356603610803, 8.698474576271186e-7, 6.41e-9, 4.0135000000000003e-7, 7.313463514902364e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (10, 7.551112553023802, 1.179e-6, 6.72e-9, 5.460317460317461e-7, 9.054450261780104e-8)
(i, cross, binv, bmul, bfactor, bsolve) = (11, 7.694614369822537, 1.482e-6, 9.089089089089089e-9, 7.148888888888889e-7, 1.0878363832077503e-7)
(i, cross, binv, bmul, bfactor, bsolve) = (12, 8.39218793950278, 1.91e-6, 7.397397397397397e-9, 9.39375e-7, 1.2305555555555556e-7)

So for 4 or smaller, inv is always faster, and after that, the factorization approach is faster if you are reusing W 4 or fewer times.

@ChrisRackauckas
Copy link
Member Author

So we should switch it to inv or lu? And Rosenbrock23 should have a way to just \?

@oscardssmith
Copy link
Member

For higher sizes, we should probably switch to lu . Why does Rosenbrock23 want to use \?

@ChrisRackauckas
Copy link
Member Author

There at least used to be a big cost to using the LU vs directly using \ with static arrays. If that got fixed we can just always LU

@oscardssmith
Copy link
Member

Looks like that has mostly disappeared

julia> J = SMatrix{8,8}(rand(8,8));
julia> u = @SVector rand(8);
julia> @btime lu($J)\$u;
  382.365 ns (0 allocations: 0 bytes)
julia> @btime $J\$u;
  326.444 ns (0 allocations: 0 bytes)

Therefore if you can reuse the jacobian even once, it pays off to factor.

@ChrisRackauckas
Copy link
Member Author

Okay so let's update this.

@oscardssmith
Copy link
Member

will do. JuliaArrays/ArrayInterface.jl#449 is pretty much a prereq since otherwise lu_instance is pretty expensive.

@ChrisRackauckas ChrisRackauckas merged commit d2ab676 into master Sep 14, 2024
2 checks passed
@ChrisRackauckas ChrisRackauckas deleted the stiffagain branch September 14, 2024 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants