-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Hypothesis is finding cases where sgkit and scikit-allel differ.
For example:
=================================== FAILURES ===================================
_______________________________ test_vs_skallel ________________________________
@given(args=ld_prune_args()) # pylint: disable=no-value-for-parameter
> @settings(max_examples=50, deadline=None, phases=PHASES_NO_SHRINK)
sgkit/tests/test_ld.py:158:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
args = (array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0... 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=uint8), 27, 2, 0.0, 19)
@given(args=ld_prune_args()) # pylint: disable=no-value-for-parameter
@settings(max_examples=50, deadline=None, phases=PHASES_NO_SHRINK)
@example(args=(np.array([[1, 1], [1, 1]], dtype="uint8"), 1, 1, 0.0, -1))
def test_vs_skallel(args):
x, size, step, threshold, chunks = args
ds = simulate_genotype_call_dataset(n_variant=x.shape[0], n_sample=x.shape[1])
ds["dosage"] = (["variants", "samples"], da.asarray(x).rechunk({0: chunks}))
ds = window_by_variant(ds, size=size, step=step)
ldm = ld_matrix(ds, threshold=threshold)
has_duplicates = ldm.compute().duplicated(subset=["i", "j"]).any()
assert not has_duplicates
idx_drop_ds = maximal_independent_set(ldm)
idx_drop = np.sort(idx_drop_ds.ld_prune_index_to_drop.data)
m = allel.locate_unlinked(x, size=size, step=step, threshold=threshold)
idx_drop_ska = np.sort(np.argwhere(~m).squeeze(axis=1))
> npt.assert_equal(idx_drop_ska, idx_drop)
E AssertionError:
E Arrays are not equal
E
E (shapes (0,), (74,) mismatch)
E x: array([], dtype=int64)
E y: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
E 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35,
E 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,...
sgkit/tests/test_ld.py:176: AssertionError
---------------------------------- Hypothesis ----------------------------------
Falsifying example: test_vs_skallel(
args=(array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=uint8), 27, 2, 0.0, 19),
)
You can reproduce this example by temporarily adding @reproduce_failure('6.47.0', b'AEsCAQEBAgADABoBAQMBAAAAAAAAAAAACQ==') as a decorator on your test case
From https://github.com/pystatgen/sgkit/runs/6813432367?check_suite_focus=true#step:8:1124.
At a cursory glance this looks like it could be due to a problem with precision, which we've had with Rogers Huff calculations before. I'm not sure why this has started happening now though - perhaps a more recent Hypothesis version has a different search strategy?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working