You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sgkit.io.vcf.vcf_to_zarr() fails to convert VCFs with INFO/CSQ annotations with error:
ValueError: INFO field 'CSQ' is defined as Number '.', which is not supported.
as tested on sgkit v0.6.0.
Presumably, the method will also fail for any VCFs containing annotations with unbounded size. INFO/CSQ contains variant effect predictions from VEP. There can be multiple predictions for each allele, one for every transcript that an allele overlaps. Each prediction is separated by a comma. The number of predictions per allele is not known in advance, and so the INFO/CSQ field is defined with unbounded size in the header, or "Number=.":
For example:
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|...>
It would be very useful to be able to filter a zarr for variants that are deemed clinically relevant according to annotation, such as loss of function variants.