Skip to content

Conversation

hyeoksu-lee
Copy link
Contributor

@hyeoksu-lee hyeoksu-lee commented Oct 6, 2025

User description

Description

This PR refactors time_stepper routines by integrating s_1st_order_tvd_rk, s_2nd_order_tvd_rk, s_3rd_order_tvd_rk, s_strang_splitting into a single subroutine s_tvd_rk.

Type of change

  • Something else

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • What computers and compilers did you use to test this:

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Checked that the code compiles using CRAY compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement


Description

  • Consolidate four separate TVD RK time-stepping subroutines into single unified s_tvd_rk

  • Replace conditional calls with unified interface accepting nstage parameter

  • Add Runge-Kutta coefficient matrix for different time-stepping orders

  • Integrate adaptive time-stepping for bubble dynamics into unified routine


Diagram Walkthrough

flowchart LR
  A["Four separate TVD RK subroutines"] --> B["Single s_tvd_rk subroutine"]
  C["Conditional time_stepper calls"] --> D["Unified interface with nstage parameter"]
  E["RK coefficient matrix"] --> B
  F["Adaptive dt integration"] --> B
Loading

File Walkthrough

Relevant files
Enhancement
m_start_up.fpp
Simplify time-stepper invocation logic                                     

src/simulation/m_start_up.fpp

  • Replace conditional calls to separate TVD RK subroutines with single
    s_tvd_rk call
  • Pass time_stepper parameter to unified routine
+2/-8     
m_time_steppers.fpp
Consolidate TVD RK time-stepping implementations                 

src/simulation/m_time_steppers.fpp

  • Remove four separate TVD RK subroutines (s_1st_order_tvd_rk,
    s_2nd_order_tvd_rk, s_3rd_order_tvd_rk, s_strang_splitting)
  • Add unified s_tvd_rk subroutine with stage loop and RK coefficient
    matrix
  • Initialize RK coefficients for different time-stepping orders in setup
  • Integrate adaptive time-stepping directly into unified routine
+99/-673

@hyeoksu-lee hyeoksu-lee requested a review from a team as a code owner October 6, 2025 21:01
Copy link

qodo-merge-pro bot commented Oct 6, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The previous logic used Strang splitting when time_stepper==3 and adap_dt==true; the new unified call only triggers for time_stepper in {1,2,3} without checking adap_dt, potentially skipping Strang splitting behavior entirely.

if (any(time_stepper == (/1, 2, 3/))) then
    call s_tvd_rk(t_step, time_avg, time_stepper)
end if
Coeff Array Bounds

rk_coef is allocated as (time_stepper,3) but indexed with s=1..nstage; if nstage>time_stepper (e.g., 2nd/3rd order), indexing rk_coef(s,*) may go out of bounds unless nstage==time_stepper is guaranteed.

@:ALLOCATE (rk_coef(time_stepper, 3))
if (time_stepper == 1) then
    rk_coef(1, 1) = 1._wp
    rk_coef(1, 2) = 0._wp
    rk_coef(1, 3) = 1._wp
else if (time_stepper == 2) then
    rk_coef(1, 1) = 1._wp
    rk_coef(1, 2) = 0._wp
    rk_coef(1, 3) = 1._wp
    rk_coef(2, 1) = 0.5_wp
    rk_coef(2, 2) = 0.5_wp
    rk_coef(2, 3) = 0.5_wp
else if (time_stepper == 3) then
    rk_coef(1, 1) = 1._wp
    rk_coef(1, 2) = 0._wp
    rk_coef(1, 3) = 1._wp
    rk_coef(2, 1) = 1._wp/4._wp
    rk_coef(2, 2) = 3._wp/4._wp
    rk_coef(2, 3) = 1._wp/4._wp
    rk_coef(3, 1) = 2._wp/3._wp
    rk_coef(3, 2) = 1._wp/3._wp
    rk_coef(3, 3) = 2._wp/3._wp
end if
State Storage Logic

The use of stor=1 for 1st order and 2 otherwise, with copying only when s==1 and time_stepper/=1, may not mirror prior GPU memory paths/dest handling; verify that q_cons_ts and pb/mv staging matches original semantics across compilers.

        if (any(time_stepper == (/1, 2, 3/))) then
            ! Time stepper index
            if (time_stepper == 1) then
                stor = 1
            else
                stor = 2
            end if

            @:ALLOCATE (rk_coef(time_stepper, 3))
            if (time_stepper == 1) then
                rk_coef(1, 1) = 1._wp
                rk_coef(1, 2) = 0._wp
                rk_coef(1, 3) = 1._wp
            else if (time_stepper == 2) then
                rk_coef(1, 1) = 1._wp
                rk_coef(1, 2) = 0._wp
                rk_coef(1, 3) = 1._wp
                rk_coef(2, 1) = 0.5_wp
                rk_coef(2, 2) = 0.5_wp
                rk_coef(2, 3) = 0.5_wp
            else if (time_stepper == 3) then
                rk_coef(1, 1) = 1._wp
                rk_coef(1, 2) = 0._wp
                rk_coef(1, 3) = 1._wp
                rk_coef(2, 1) = 1._wp/4._wp
                rk_coef(2, 2) = 3._wp/4._wp
                rk_coef(2, 3) = 1._wp/4._wp
                rk_coef(3, 1) = 2._wp/3._wp
                rk_coef(3, 2) = 1._wp/3._wp
                rk_coef(3, 3) = 2._wp/3._wp
            end if
        end if

    end subroutine s_initialize_time_steppers_module

    impure subroutine s_tvd_rk(t_step, time_avg, nstage)
#ifdef _CRAYFTN
        !DIR$ OPTIMIZE (-haggress)
#endif
        integer, intent(in) :: t_step
        real(wp), intent(inout) :: time_avg
        integer, intent(in) :: nstage

        integer :: i, j, k, l, q, s !< Generic loop iterator
        real(wp) :: start, finish
        integer :: dest

        call cpu_time(start)
        call nvtxStartRange("TIMESTEP")

        ! Adaptive dt: initial stage
        if (adap_dt) call s_adaptive_dt_bubble(1)

        do s = 1, nstage
            call s_compute_rhs(q_cons_ts(1)%vf, q_T_sf, q_prim_vf, bc_type, rhs_vf, pb_ts(1)%sf, rhs_pb, mv_ts(1)%sf, rhs_mv, t_step, time_avg, s)

            if (s == 1 .and. run_time_info) then
                if (igr) then
                    call s_write_run_time_information(q_cons_ts(1)%vf, t_step)
                else
                    call s_write_run_time_information(q_prim_vf, t_step)
                end if
            end if

            if (probe_wrt) then
                call s_time_step_cycling(t_step)
            end if

            if (cfl_dt) then
                if (mytime >= t_stop) return
            else
                if (t_step == t_step_stop) return
            end if

            if (bubbles_lagrange .and. .not. adap_dt) call s_update_lagrange_tdv_rk(stage=s)

            $:GPU_PARALLEL_LOOP(collapse=4)
            do i = 1, sys_size
                do l = 0, p
                    do k = 0, n
                        do j = 0, m
                            if (s == 1 .and. time_stepper /= 1) then
                                q_cons_ts(stor)%vf(i)%sf(j, k, l) = &
                                    q_cons_ts(1)%vf(i)%sf(j, k, l)
                            end if
                            q_cons_ts(1)%vf(i)%sf(j, k, l) = &
                                rk_coef(s, 1)*q_cons_ts(1)%vf(i)%sf(j, k, l) &
                                + rk_coef(s, 2)*q_cons_ts(stor)%vf(i)%sf(j, k, l) &
                                + rk_coef(s, 3)*dt*rhs_vf(i)%sf(j, k, l)
                        end do
                    end do
                end do
            end do

            !Evolve pb and mv for non-polytropic qbmm
            if (qbmm .and. (.not. polytropic)) then
                $:GPU_PARALLEL_LOOP(collapse=5)
                do i = 1, nb
                    do l = 0, p
                        do k = 0, n
                            do j = 0, m
                                do q = 1, nnode
                                    if (s == 1 .and. time_stepper /= 1) then
                                        pb_ts(stor)%sf(j, k, l, q, i) = &
                                            pb_ts(1)%sf(j, k, l, q, i)
                                        mv_ts(stor)%sf(j, k, l, q, i) = &
                                            mv_ts(1)%sf(j, k, l, q, i)
                                    end if
                                    pb_ts(1)%sf(j, k, l, q, i) = &
                                        rk_coef(s, 1)*pb_ts(1)%sf(j, k, l, q, i) &
                                        + rk_coef(s, 2)*pb_ts(stor)%sf(j, k, l, q, i) &
                                        + rk_coef(s, 3)*dt*rhs_pb(j, k, l, q, i)
                                    mv_ts(1)%sf(j, k, l, q, i) = &
                                        rk_coef(s, 1)*mv_ts(1)%sf(j, k, l, q, i) &
                                        + rk_coef(s, 2)*mv_ts(stor)%sf(j, k, l, q, i) &
                                        + rk_coef(s, 3)*dt*rhs_mv(j, k, l, q, i)

if (bodyForces) call s_apply_bodyforces(q_cons_ts(dest)%vf, q_prim_vf, rhs_vf, 2._wp*dt/3._wp)

if (grid_geometry == 3) call s_apply_fourier_filter(q_cons_ts(dest)%vf)
if (bodyForces) call s_apply_bodyforces(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, rk_coef(s, 3)*dt)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The time step argument for s_apply_bodyforces has changed for the 2nd order scheme due to the refactoring. Use a select case block to apply the correct time step coefficients for each scheme and stage to maintain the original numerical behavior. [possible issue, importance: 8]

Suggested change
if (bodyForces) call s_apply_bodyforces(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, rk_coef(s, 3)*dt)
if (bodyForces) then
real(wp) :: bf_dt
select case (time_stepper)
case (1) ! 1st order
bf_dt = dt
case (2) ! 2nd order
if (s == 1) then
bf_dt = dt
else
bf_dt = 0.5_wp * dt
end if
case (3) ! 3rd order
if (s == 1) then
bf_dt = dt
else if (s == 2) then
bf_dt = 0.25_wp * dt
else
bf_dt = (2._wp/3._wp) * dt
end if
end select
call s_apply_bodyforces(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, bf_dt)
end if

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hyeoksu-lee did you introduce a body forces bug? can't quite tell without looking closer

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ask

@hyeoksu-lee did you introduce a body forces bug? can't quite tell without looking closer

Answer:

Likely yes: the refactor changes the time increment passed to s_apply_bodyforces in multi-stage schemes, altering numerical behavior.

What changed

  • Previously, each TVD-RK routine applied body forces with scheme-specific dt weights:
    • 1st order: dt
    • 2nd order:
      • Stage 1: dt
      • Stage 2: 2/3 * dt (see call s_apply_bodyforces(..., 2._wp*dt/3._wp))
    • 3rd order:
      • Stage 1: dt
      • Stage 2: dt/4
      • Stage 3: 2/3 * dt
  • After refactor, s_tvd_rk unconditionally calls:
    • call s_apply_bodyforces(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, rk_coef(s, 3)*dt)
  • The rk_coef(s,3) you set equals:
    • 1st order: [1]
    • 2nd order: [1, 1/2]
    • 3rd order: [1, 1/4, 2/3]
  • Mismatch:
    • 2nd order stage 2: old was 2/3dt, new is 1/2dt
    • 3rd order matches the old weights (dt, dt/4, 2/3 dt)
    • 1st order matches.

Impact

  • For the 2nd order TVD-RK, body force contribution is reduced at stage 2 (0.5 vs 0.666... factor), which can alter accuracy and possibly stability for problems where body forces are significant.

Suggested fix

  • Don’t tie body force dt to rk_coef(s,3). Restore scheme-specific weights for body forces:
    • 1st order: dt
    • 2nd order: stage 1 dt, stage 2 2/3*dt
    • 3rd order: stage 1 dt, stage 2 dt/4, stage 3 2/3*dt
  • Example patch inline with your loop:
    • if (bodyForces) then
      • select case (time_stepper)
        • case (1); bf_dt = dt
        • case (2); bf_dt = merge(dt, 2._wp*dt/3._wp, s == 1)
        • case (3); bf_dt = merge(dt, merge(0.25_wpdt, 2._wpdt/3._wp, s == 3), s == 1)
      • end select
      • call s_apply_bodyforces(q_cons_ts(1)%vf, q_prim_vf, rhs_vf, bf_dt)
    • end if

Bottom line

  • Yes, there is a regression for time_stepper == 2: stage-2 body force dt should be 2/3dt, not 1/2dt.

Copy link
Contributor Author

@hyeoksu-lee hyeoksu-lee Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbryngelson Ah yes one coefficient I added is wrong. let me fix it.

By the way, why does the second-order Runge-Kutta method have a body force dt that is inconsistent with the time advancement of other variables? Who can I ask for more details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilfonba can comment. could be a bug, not sure. we don't often use rk2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could be a bug introduced by copying and pasting it from RK3... what s_apply_bodyforces does is basically q_cons_vf = q_cons_vf + dt * rhs_vf, so body_forces should advance the same time step I believe.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question:

I guess it could be a bug introduced by copying and pasting it from RK3... what s_apply_bodyforces does is basically q_cons_vf = q_cons_vf + dt * rhs_vf, so body_forces should advance the same time step I believe.

Answer:

To reduce future false positives and steer Qodo Merge suggestions toward your project’s RK time-stepping rules, add targeted extra instructions for the improve/review tools. For example, encode your scheme-specific body-force dt policy so the bot won’t propose conflicting coefficients:

[pr_code_suggestions]
extra_instructions = """\
Time-integration policy (do not change unless explicitly requested):
- s_apply_bodyforces advances q_cons_vf via q += dt*rhs over the exact stage dt used for other state updates.
- For TVD-RK:
  * 1st order: dt
  * 2nd order: stage1 dt, stage2 2/3*dt (or project standard)
  * 3rd order: stage1 dt, stage2 1/4*dt, stage3 2/3*dt
Reject suggestions that tie body-force dt to generic rk_coef indices unless they reproduce these exact weights.
"""

[pr_reviewer]
extra_instructions = """\
Validate RK stage coefficients against project-standard time-stepping:
- Body forces must use the same effective stage dt as conserved variables.
- Flag any change that alters numerical behavior by mixing RK weight arrays with body-force dt.
"""

If you keep best practices centrally, also capture this as a rule in best_practices.md so the guidance is consistently applied across PRs.

Relevant Sources:

@sbryngelson
Copy link
Member

I love a good red line PR! Will review soon

@hyeoksu-lee
Copy link
Contributor Author

Some of test suite failed (with the error close to tolerance, though), so set this PR as draft for now

@hyeoksu-lee hyeoksu-lee marked this pull request as draft October 6, 2025 21:25
@sbryngelson
Copy link
Member

small errors when refactoring the time steppers was also seen by @danieljvickers... so long as i'm convinced the code is right we might need a new goldenfile that satiates all the time steppers (or figure out the offending case and set looser tolerance).

@hyeoksu-lee hyeoksu-lee marked this pull request as ready for review October 6, 2025 21:29
Copy link

qodo-merge-pro bot commented Oct 6, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The unified call removes the previous branching that selected Strang splitting when time_stepper == 3 .and. adap_dt. Now it calls s_tvd_rk(time_stepper) only if 1..3, but there is no handling for adap_dt here, potentially changing behavior for adaptive-DT third-order runs.

if (any(time_stepper == (/1, 2, 3/))) then
    call s_tvd_rk(t_step, time_avg, time_stepper)
end if
Coeff Init

rk_coef is allocated with shape (time_stepper,3) each init, but accesses rk_coef(s, :) inside s_tvd_rk for s=1..nstage. Ensure nstage == time_stepper always; otherwise out-of-bounds. Consider allocating (nstage,3) or validating inputs.

@:ALLOCATE (rk_coef(time_stepper, 3))
if (time_stepper == 1) then
    rk_coef(1, 1) = 1._wp
    rk_coef(1, 2) = 0._wp
    rk_coef(1, 3) = 1._wp
else if (time_stepper == 2) then
    rk_coef(1, 1) = 1._wp
    rk_coef(1, 2) = 0._wp
    rk_coef(1, 3) = 1._wp
    rk_coef(2, 1) = 0.5_wp
    rk_coef(2, 2) = 0.5_wp
    rk_coef(2, 3) = 0.5_wp
else if (time_stepper == 3) then
    rk_coef(1, 1) = 1._wp
    rk_coef(1, 2) = 0._wp
    rk_coef(1, 3) = 1._wp
    rk_coef(2, 1) = 1._wp/4._wp
    rk_coef(2, 2) = 3._wp/4._wp
    rk_coef(2, 3) = 1._wp/4._wp
    rk_coef(3, 1) = 2._wp/3._wp
    rk_coef(3, 2) = 1._wp/3._wp
    rk_coef(3, 3) = 2._wp/3._wp
end if
State Storage

The storage index stor is set to 1 for 1st order and 2 for others, and stage-1 copies reference q_cons_ts(stor) and pb_ts(stor), mv_ts(stor). Verify pools are sized and initialized for stor==2; also confirm device annotations (GPU_DECLARE) include rk_coef and any needed buffers for consistent GPU behavior.

if (any(time_stepper == (/1, 2, 3/))) then
    ! Time stepper index
    if (time_stepper == 1) then
        stor = 1
    else
        stor = 2
    end if

    @:ALLOCATE (rk_coef(time_stepper, 3))
    if (time_stepper == 1) then
        rk_coef(1, 1) = 1._wp
        rk_coef(1, 2) = 0._wp
        rk_coef(1, 3) = 1._wp
    else if (time_stepper == 2) then
        rk_coef(1, 1) = 1._wp
        rk_coef(1, 2) = 0._wp
        rk_coef(1, 3) = 1._wp
        rk_coef(2, 1) = 0.5_wp
        rk_coef(2, 2) = 0.5_wp
        rk_coef(2, 3) = 0.5_wp
    else if (time_stepper == 3) then
        rk_coef(1, 1) = 1._wp
        rk_coef(1, 2) = 0._wp
        rk_coef(1, 3) = 1._wp
        rk_coef(2, 1) = 1._wp/4._wp
        rk_coef(2, 2) = 3._wp/4._wp
        rk_coef(2, 3) = 1._wp/4._wp
        rk_coef(3, 1) = 2._wp/3._wp
        rk_coef(3, 2) = 1._wp/3._wp
        rk_coef(3, 3) = 2._wp/3._wp

@hyeoksu-lee
Copy link
Contributor Author

Oh I see.. I put it back to ready for review. Please review the change and let me know if you have any question or concerns!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level Suggestion

To improve readability and maintainability, define the Runge-Kutta coefficients as named constants or parameters instead of hardcoding them in a large if-else block. [High-level, importance: 6]

Solution Walkthrough:

Before:

subroutine s_initialize_time_steppers_module
    ...
    if (any(time_stepper == (/1, 2, 3/))) then
        @:ALLOCATE (rk_coef(time_stepper, 3))
        if (time_stepper == 1) then
            rk_coef(1, 1) = 1._wp
            rk_coef(1, 2) = 0._wp
            rk_coef(1, 3) = 1._wp
        else if (time_stepper == 2) then
            rk_coef(1, 1) = 1._wp; rk_coef(1, 2) = 0._wp; rk_coef(1, 3) = 1._wp
            rk_coef(2, 1) = 0.5_wp; rk_coef(2, 2) = 0.5_wp; rk_coef(2, 3) = 0.5_wp
        else if (time_stepper == 3) then
            ...
        end if
    end if
end subroutine

After:

module m_time_steppers
    ...
    real(wp), parameter :: RK1_COEFS(1,3) = reshape((/ 1.0_wp, 0.0_wp, 1.0_wp /), (/1,3/))
    real(wp), parameter :: RK2_COEFS(2,3) = reshape((/ 1.0_wp, 0.0_wp, 1.0_wp, &
                                                      0.5_wp, 0.5_wp, 0.5_wp /), (/2,3/))
    ...
contains
    subroutine s_initialize_time_steppers_module
        ...
        if (time_stepper == 1) then
            rk_coef = RK1_COEFS
        else if (time_stepper == 2) then
            rk_coef = RK2_COEFS
        ...
        end if
    end subroutine

hyeoksu-lee and others added 2 commits October 6, 2025 14:50
Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
Co-authored-by: qodo-merge-pro[bot] <151058649+qodo-merge-pro[bot]@users.noreply.github.com>
@danieljvickers
Copy link
Member

Oh yeah, I tried to do something similar a few months ago, but kept getting weird compiler-dependent bugs that I couldn't reproduce locally. I like this PR. Looks like it is reducing a lot of the redundant code in the time_stepper. Seems odd that you are failing so many tests by so little. I would be interested to also see how far off of the golden files you are for the tests that pass, and see if they are similar.

@sbryngelson
Copy link
Member

@hyeoksu-lee status on this? was working i guess, but now everything is failing?

@hyeoksu-lee
Copy link
Contributor Author

@sbryngelson All failing CPU tests are due to error of the order of 1e-10, which is tolerance level. I am looking into the code to see if there is any systematic issue. And I am waiting for @wilfonba to review body force input for RK2.

In the meantime, I need to fix GPU bug, which seems not a simple tolerance-type issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants