Skip to content

genotype_count_spec is inconsistent with VCF spec #911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timothymillar opened this issue Sep 26, 2022 · 5 comments · Fixed by #1002
Closed

genotype_count_spec is inconsistent with VCF spec #911

timothymillar opened this issue Sep 26, 2022 · 5 comments · Fixed by #1002
Assignees
Milestone

Comments

@timothymillar
Copy link
Collaborator

Just a minor inconsistency I noted given that we seem to follow the VCF spec in most cases. The genotype_count_spec states "the 3 columns contain heterozygous, homozygous reference, and homozygous alternate counts (in that order)". The VCF genotype ordering would typically be homozygous reference, heterozygous, homozygous alternate for a diploid biallelic variant as described for the "GL" field in the spec.

@timothymillar
Copy link
Collaborator Author

Just to add to that, If we were to change the ordering to match VCF ordering then this variable could easily be generalized to multi-allelic variants and autopolyploids.

@tomwhite
Copy link
Collaborator

tomwhite commented Sep 29, 2022

+1 to introducing a generalized ordering like GL.

It looks like the genotype_count variable is only used in hardy_weinberg_test - and it's optional. Does HWE generalize to multi-allelic/autopolyploids? If not, then we might leave HWE as it is, and rename its variable to hwe_genotype_count_spec or similar. If it does then we might want to consider changing it in HWE too, but may need to think about backwards compatibility (as it's optional it's possible it's not being used anywhere). I don't know why that order was used in HWE (@eric-czech might do).

BTW here is the relevant test: https://github.com/pystatgen/sgkit/blob/68b983c51c2c7e08914d64f83cd7b3fd05616f1a/sgkit/tests/test_hwe.py#L139

@timothymillar
Copy link
Collaborator Author

Does HWE generalize to multi-allelic/autopolyploids?

Yes in general although I'm not familiar with implementation details (something on my long term todo list).

@timothymillar timothymillar self-assigned this Jan 18, 2023
@timothymillar
Copy link
Collaborator Author

@tomwhite I'm not sure how urgently you want to release 0.6.0 but I can try sort this tomorrow.

@tomwhite
Copy link
Collaborator

Thanks Tim, that would be great - I'll try to release next week.

@timothymillar timothymillar added this to the 0.6.0 milestone Jan 18, 2023
timothymillar added a commit to timothymillar/sgkit that referenced this issue Jan 19, 2023
- calculate variant_genotype_count if absent
- remove variable genotype_count from the spec
timothymillar added a commit to timothymillar/sgkit that referenced this issue Jan 24, 2023
- calculate variant_genotype_count if absent
- remove variable genotype_count from the spec
timothymillar added a commit to timothymillar/sgkit that referenced this issue Jan 26, 2023
- calculate variant_genotype_count if absent
- remove variable genotype_count from the spec
timothymillar added a commit to timothymillar/sgkit that referenced this issue Jan 27, 2023
- calculate variant_genotype_count if absent
- remove variable genotype_count from the spec
@mergify mergify bot closed this as completed in #1002 Jan 30, 2023
mergify bot pushed a commit that referenced this issue Jan 30, 2023
- calculate variant_genotype_count if absent
- remove variable genotype_count from the spec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants