-
Notifications
You must be signed in to change notification settings - Fork 29
Sharded BAM reader and a sample read counting pipeline #34
Conversation
|
Congrats Ilia! This is a huge step forward! ~p |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo - GATK
|
Hey Ilia, Travis is not happy :( Paul |
|
@pgrosu Yeah, I see that. Looking |
|
@iliat, Cool - yeah they're minor fixes - thanks man :) |
0a661e4 to
447361c
Compare
|
Awesome! Thanks Ilia :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there really no builtin operation for this? Rats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find anything here: https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/package-summary
The only other alternative is to use a Coders and TextIO.Write.withCoder
but unfortunately the only prepackaged coder that can help is
TextualIntegerCoder and its not suitable for Longs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks for the context - I concur there's no ToString transform that I can find at present. I've checked in with Jeff G from the Dataflow team to make sure I'm not missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why not just take this out and put it into a common directory for all the extended functionality for Dataflow until it is implemented to make the code more readable? I get the feeling it will be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in storgae
|
LGTM - let's get the unit tests in a separate commit, as you say. Thanks for updating comments on the subtle question of why this code does find all the reads. Many thanks, and nice work - it's outstanding to have a full BAM reader available! Looking forward to experimenting with this further and perhaps iterating on the code design a bit once there are tests available. |
|
Definitely +1 :) |
Sharded BAM reader and a sample read counting pipeline
|
Hurray! Thank you! On Thu, Mar 12, 2015 at 3:59 PM, Ilia Tulchinsky [email protected]
|
|
Things should get very interesting :) |
Sharded BAM reader and a sample read counting pipeline
@wbrockman @jean-philippe-martin @dionloy - here is the first version of the sharded BAM reader. Works in the cloud with a workaround for credentials factory.
Also added dependency on HTSJDK in pom.xml