LD prune test is failing due to differences with scikit-allel

Hypothesis is finding cases where sgkit and scikit-allel differ.

For example:

<details>

```
=================================== FAILURES ===================================
_______________________________ test_vs_skallel ________________________________
    @given(args=ld_prune_args())  # pylint: disable=no-value-for-parameter
>   @settings(max_examples=50, deadline=None, phases=PHASES_NO_SHRINK)
sgkit/tests/test_ld.py:158: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
args = (array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0... 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=uint8), 27, 2, 0.0, 19)
    @given(args=ld_prune_args())  # pylint: disable=no-value-for-parameter
    @settings(max_examples=50, deadline=None, phases=PHASES_NO_SHRINK)
    @example(args=(np.array([[1, 1], [1, 1]], dtype="uint8"), 1, 1, 0.0, -1))
    def test_vs_skallel(args):
        x, size, step, threshold, chunks = args
        ds = simulate_genotype_call_dataset(n_variant=x.shape[0], n_sample=x.shape[1])
        ds["dosage"] = (["variants", "samples"], da.asarray(x).rechunk({0: chunks}))
        ds = window_by_variant(ds, size=size, step=step)
        ldm = ld_matrix(ds, threshold=threshold)
        has_duplicates = ldm.compute().duplicated(subset=["i", "j"]).any()
        assert not has_duplicates
        idx_drop_ds = maximal_independent_set(ldm)
        idx_drop = np.sort(idx_drop_ds.ld_prune_index_to_drop.data)
        m = allel.locate_unlinked(x, size=size, step=step, threshold=threshold)
        idx_drop_ska = np.sort(np.argwhere(~m).squeeze(axis=1))
>       npt.assert_equal(idx_drop_ska, idx_drop)
E       AssertionError: 
E       Arrays are not equal
E       
E       (shapes (0,), (74,) mismatch)
E        x: array([], dtype=int64)
E        y: array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
E              18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35,
E              36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,...

sgkit/tests/test_ld.py:176: AssertionError
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_vs_skallel(
    args=(array([[0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 2, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0],
            [0, 0, 0, 0]], dtype=uint8), 27, 2, 0.0, 19),
)
You can reproduce this example by temporarily adding @reproduce_failure('6.47.0', b'AEsCAQEBAgADABoBAQMBAAAAAAAAAAAACQ==') as a decorator on your test case
```
</details>

From https://github.com/pystatgen/sgkit/runs/6813432367?check_suite_focus=true#step:8:1124.

At a cursory glance this looks like it could be due to a problem with precision, which we've had with Rogers Huff calculations before. I'm not sure why this has started happening now though - perhaps a more recent Hypothesis version has a different search strategy?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LD prune test is failing due to differences with scikit-allel #864

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LD prune test is failing due to differences with scikit-allel #864

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions