Skip to content
This repository was archived by the owner on Oct 29, 2023. It is now read-only.

Conversation

@calbach
Copy link
Contributor

@calbach calbach commented Mar 6, 2015

See various caveats and disclaimers in comments: this is a limited sample application.

One thing which may need revision is the output; right now it's really only human readable (at best). Open to suggestions on a better output format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deflaux let me know if you'd prefer I fork into my own options at this point. Not sure how much we want to jam into this object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine place for it for now.

@calbach calbach force-pushed the variant-annotation branch from 4dded6d to 6798954 Compare March 6, 2015 17:10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I'm understanding this correctly, are these assumptions true?

  • By default, this pipeline will yield an output record for every alternate allele in 1,000 Genomes within BRCA1 that is a SNP and has an effect other than synonymous.
  • For 1,000 genomes, restricting to sample HG00261 has no bearing on the output of this pipeline since all samples have calls for all variants (and we are also not retrieving/looking at the genotype within the call).
  • If we change the job parameters to run on Platinum Genomes and a callSetId within it, we will only annotate the variants that the specified callSetId has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. People typical run a variant annotation program on a single VCF, so I think the behavior is reasonably well aligned with a user's expectations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CH, is right but you still want to keep track of what you're annotating since metadata is still important if you combine datasets or are comparing them. If you can cache them that will save you time later on.

@deflaux
Copy link
Contributor

deflaux commented Mar 7, 2015

This looks good to me - merge it at your convenience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Why not convert the list of Contigs to a PCollection directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@coveralls
Copy link

Coverage Status

Coverage increased (+0.78%) to 24.63% when pulling 64aa02f on variant-annotation into efd9219 on master.

@calbach calbach force-pushed the variant-annotation branch from 64aa02f to 6b46e28 Compare March 19, 2015 20:47
@calbach
Copy link
Contributor Author

calbach commented Mar 19, 2015

Rebased, made some performance changes, and added some timing information. The end result is that it will currently work well on small regions, but performs quite poorly on whole variant-sets, on account of SearchVariants throughput. This should improve over time.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.61%) to 24.46% when pulling 6b46e28 on variant-annotation into efd9219 on master.

deflaux added a commit that referenced this pull request Mar 19, 2015
Implement sample variant annotation dataflow pipeline
@deflaux deflaux merged commit 1efa5db into master Mar 19, 2015
@deflaux
Copy link
Contributor

deflaux commented Mar 19, 2015

Nice sample CH!

@deflaux deflaux deleted the variant-annotation branch March 30, 2015 20:47
jiridanek pushed a commit to jiridanek/dataflow-java that referenced this pull request Jan 18, 2016
…tation

Implement sample variant annotation dataflow pipeline
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants