Skip to content

Cannot round-trip explicitly set missing INFO values in VCF #1197

@jeromekelleher

Description

@jeromekelleher

The all_fields.vcf file contains lots of examples where we explicitly state that an INFO key is missing, rather than omitting the key, e.g. II1=. and II2=.,. here. This was handled before #1190 because we treating non-present INFO keys as PAD values and only these explicit "key=." values as missing.

I don't think it's a useful distinction, and likely to cause more problems downstream if we distinguish between these two types of missingness. I'm fairly clear that regarding missing keys as dimension padding isn't helpful, in any case.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  s1      s2
1       1       .       G       A,C     .       PASS    IB0     .       .       .
1       2       .       A       G,G     .       PASS    II1=126 .       .       .
1       3       .       A       G,G     .       PASS    II1=.   .       .       .
1       4       .       T       A,C     .       PASS    II2=459,-140    .       .       .
1       5       .       T       A,C     .       PASS    II2=.,-140      .       .       .
1       6       .       T       A,C     .       PASS    II2=459,.       .       .       .
1       7       .       T       A,C     .       PASS    II2=.,. .       .       .

However, it seems that bcftools at least does make this distinction, and losslessly roundtrips this VCF through BCF.

My suggestion here is that we just edit the all_fields.vcf file to remove all-missing values. This seems like a pretty niche problem, and probably something we'd need to deal with explicitly at the spec level rather than here. It's not worth getting bogged down on, I think.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions