Skip to content
This repository was archived by the owner on Oct 29, 2023. It is now read-only.

Conversation

@deflaux
Copy link
Contributor

@deflaux deflaux commented Mar 11, 2015

Also:

  • Update option descriptions.
  • Bump genomics API version.
  • Ensure that we have one task per core.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.06%) to 30.11% when pulling 78d61e6 on deflaux:master into af2db8d on googlegenomics:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.06%) to 30.11% when pulling 78d61e6 on deflaux:master into af2db8d on googlegenomics:master.

@pgrosu
Copy link

pgrosu commented Mar 11, 2015

Hi Nicole,

On line 122 in DataflowWorkarounds.java it is currently written as:

if(3 == machineNameParts.length) {

It would be preferred as follows, or maybe a variable that can be assigned:

if(machineNameParts.length == 3) {

~p

@pgrosu
Copy link

pgrosu commented Mar 11, 2015

On line 125 of the same file, it is currently written as:

numWorkers *= numCores;

Could we expand it just for clarification purposes:

numWorkers = numWorkers * numCores;

~p

@pgrosu
Copy link

pgrosu commented Mar 11, 2015

Looks nice, I'll try it once merged.

Thanks,
~p

deflaux added 5 commits March 12, 2015 20:30
Its now in the codelabs repository.
--machineType does not have a default of n1-standard-4 in all contexts.
Now that we are doing client-side filtering for strict shard boundaries, we need to ensure that we are requesting the field that the filter will check.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.36%) to 23.92% when pulling cb269a9 on deflaux:master into 70bc9f7 on googlegenomics:master.

@pgrosu
Copy link

pgrosu commented Mar 13, 2015

Hi Nicole,

Unfortunately it throws an error, since I cannot give it a range on a specific chromosome :( It wants a string for the reference, and I only want to count the reads within a specific range. Is there a solution?

$ java -cp target/google-genomics-dataflow-v1beta2-0.5-SNAPSHOT.jar com.google.cloud.genomics.dataflow.pipelines.CountReads --readGroupSetId=CMvnhpKTFhDq9e2Yy9G-Bg --references=1:1000:10000  --genomicsSecretsFile=client_secrets.json --project=<redacted>  --datasetId=10473108253681171589  --numWorkers=10  --output=counts.txt
Mar 13, 2015 3:57:30 AM com.google.cloud.genomics.dataflow.utils.DataflowWorkarounds registerGenomicsCoders
INFO: Registering coders for genomics classes
Mar 13, 2015 3:57:30 AM org.reflections.Reflections scan
INFO: Reflections took 144 ms to scan 10 urls, producing 1 keys and 77 values
...
INFO: Turning 1 options into 10 workers
Mar 13, 2015 3:57:30 AM com.google.cloud.genomics.dataflow.utils.DataflowWorkarounds getPCollection
INFO: Adding collection with 0 to 1
Mar 13, 2015 3:57:30 AM com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner run
INFO: Executing pipeline using the DirectPipelineRunner.
Mar 13, 2015 3:57:30 AM com.google.cloud.genomics.dataflow.readers.ReadReader processApiCall
INFO: Starting Reads read loop
Exception in thread "main" java.lang.RuntimeException: com.google.cloud.genomics.utils.Paginator$SearchException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "The given readGroupSets are not aligned to any reference with referenceName \"1:1000:10000\". Wanted one of [\"1\" \"2\" \"3\" \"4\" \"5\" \"6\" \"7\" \"8\" \"9\" \"10\" \"11\" \"12\" \"13\" \"14\" \"15\" \"16\" \"17\" \"18\" \"19\" \"20\" \"21\" \"22\" \"X\" \"Y\" \"MT\" \"GL000207.1\" \"GL000226.1\" \"GL000229.1\" \"GL000231.1\" \"GL000210.1\" \"GL000239.1\" \"GL000235.1\" \"GL000201.1\" \"GL000247.1\" \"GL000245.1\" \"GL000197.1\" \"GL000203.1\" \"GL000246.1\" \"GL000249.1\" \"GL000196.1\" \"GL000248.1\" \"GL000244.1\" \"GL000238.1\" \"GL000202.1\" \"GL000234.1\" \"GL000232.1\" \"GL000206.1\" \"GL000240.1\" \"GL000236.1\" \"GL000241.1\" \"GL000243.1\" \"GL000242.1\" \"GL000230.1\" \"GL000237.1\" \"GL000233.1\" \"GL000204.1\" \"GL000198.1\" \"GL000208.1\" \"GL000191.1\" \"GL000227.1\" \"GL000228.1\" \"GL000214.1\" \"GL000221.1\" \"GL000209.1\" \"GL000218.1\" \"GL000220.1\" \"GL000213.1\" \"GL000211.1\" \"GL000199.1\" \"GL000217.1\" \"GL000216.1\" \"GL000215.1\" \"GL000205.1\" \"GL000219.1\" \"GL000224.1\" \"GL000223.1\" \"GL000195.1\" \"GL000212.1\" \"GL000222.1\" \"GL000200.1\" \"GL000193.1\" \"GL000194.1\" \"GL000225.1\" \"GL000192.1\" \"NC_007605\" \"hs37d5\" \"*\"]",
    "reason" : "invalidArgument"
  } ],
  "message" : "The given readGroupSets are not aligned to any reference with referenceName \"1:1000:10000\". Wanted one of [\"1\" \"2\" \"3\" \"4\" \"5\" \"6\" \"7\" \"8\" \"9\" \"10\" \"11\" \"12\" \"13\" \"14\" \"15\" \"16\" \"17\" \"18\" \"19\" \"20\" \"21\" \"22\" \"X\" \"Y\" \"MT\" \"GL000207.1\" \"GL000226.1\" \"GL000229.1\" \"GL000231.1\" \"GL000210.1\" \"GL000239.1\" \"GL000235.1\" \"GL000201.1\" \"GL000247.1\" \"GL000245.1\" \"GL000197.1\" \"GL000203.1\" \"GL000246.1\" \"GL000249.1\" \"GL000196.1\" \"GL000248.1\" \"GL000244.1\" \"GL000238.1\" \"GL000202.1\" \"GL000234.1\" \"GL000232.1\" \"GL000206.1\" \"GL000240.1\" \"GL000236.1\" \"GL000241.1\" \"GL000243.1\" \"GL000242.1\" \"GL000230.1\" \"GL000237.1\" \"GL000233.1\" \"GL000204.1\" \"GL000198.1\" \"GL000208.1\" \"GL000191.1\" \"GL000227.1\" \"GL000228.1\" \"GL000214.1\" \"GL000221.1\" \"GL000209.1\" \"GL000218.1\" \"GL000220.1\" \"GL000213.1\" \"GL000211.1\" \"GL000199.1\" \"GL000217.1\" \"GL000216.1\" \"GL000215.1\" \"GL000205.1\" \"GL000219.1\" \"GL000224.1\" \"GL000223.1\" \"GL000195.1\" \"GL000212.1\" \"GL000222.1\" \"GL000200.1\" \"GL000193.1\" \"GL000194.1\" \"GL000225.1\" \"GL000192.1\" \"NC_007605\" \"hs37d5\" \"*\"]"
}
        at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:151)
        at com.google.cloud.genomics.dataflow.pipelines.CountReads.main(CountReads.java:127)
Caused by: com.google.cloud.genomics.utils.Paginator$SearchException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "The given readGroupSets are not aligned to any reference with referenceName \"1:1000:10000\". Wanted one of [\"1\" \"2\" \"3\" \"4\" \"5\" \"6\" \"7\" \"8\" \"9\" \"10\" \"11\" \"12\" \"13\" \"14\" \"15\" \"16\" \"17\" \"18\" \"19\" \"20\" \"21\" \"22\" \"X\" \"Y\" \"MT\" \"GL000207.1\" \"GL000226.1\" \"GL000229.1\" \"GL000231.1\" \"GL000210.1\" \"GL000239.1\" \"GL000235.1\" \"GL000201.1\" \"GL000247.1\" \"GL000245.1\" \"GL000197.1\" \"GL000203.1\" \"GL000246.1\" \"GL000249.1\" \"GL000196.1\" \"GL000248.1\" \"GL000244.1\" \"GL000238.1\" \"GL000202.1\" \"GL000234.1\" \"GL000232.1\" \"GL000206.1\" \"GL000240.1\" \"GL000236.1\" \"GL000241.1\" \"GL000243.1\" \"GL000242.1\" \"GL000230.1\" \"GL000237.1\" \"GL000233.1\" \"GL000204.1\" \"GL000198.1\" \"GL000208.1\" \"GL000191.1\" \"GL000227.1\" \"GL000228.1\" \"GL000214.1\" \"GL000221.1\" \"GL000209.1\" \"GL000218.1\" \"GL000220.1\" \"GL000213.1\" \"GL000211.1\" \"GL000199.1\" \"GL000217.1\" \"GL000216.1\" \"GL000215.1\" \"GL000205.1\" \"GL000219.1\" \"GL000224.1\" \"GL000223.1\" \"GL000195.1\" \"GL000212.1\" \"GL000222.1\" \"GL000200.1\" \"GL000193.1\" \"GL000194.1\" \"GL000225.1\" \"GL000192.1\" \"NC_007605\" \"hs37d5\" \"*\"]",
    "reason" : "invalidArgument"
  } ],
  "message" : "The given readGroupSets are not aligned to any reference with referenceName \"1:1000:10000\". Wanted one of [\"1\" \"2\" \"3\" \"4\" \"5\" \"6\" \"7\" \"8\" \"9\" \"10\" \"11\" \"12\" \"13\" \"14\" \"15\" \"16\" \"17\" \"18\" \"19\" \"20\" \"21\" \"22\" \"X\" \"Y\" \"MT\" \"GL000207.1\" \"GL000226.1\" \"GL000229.1\" \"GL000231.1\" \"GL000210.1\" \"GL000239.1\" \"GL000235.1\" \"GL000201.1\" \"GL000247.1\" \"GL000245.1\" \"GL000197.1\" \"GL000203.1\" \"GL000246.1\" \"GL000249.1\" \"GL000196.1\" \"GL000248.1\" \"GL000244.1\" \"GL000238.1\" \"GL000202.1\" \"GL000234.1\" \"GL000232.1\" \"GL000206.1\" \"GL000240.1\" \"GL000236.1\" \"GL000241.1\" \"GL000243.1\" \"GL000242.1\" \"GL000230.1\" \"GL000237.1\" \"GL000233.1\" \"GL000204.1\" \"GL000198.1\" \"GL000208.1\" \"GL000191.1\" \"GL000227.1\" \"GL000228.1\" \"GL000214.1\" \"GL000221.1\" \"GL000209.1\" \"GL000218.1\" \"GL000220.1\" \"GL000213.1\" \"GL000211.1\" \"GL000199.1\" \"GL000217.1\" \"GL000216.1\" \"GL000215.1\" \"GL000205.1\" \"GL000219.1\" \"GL000224.1\" \"GL000223.1\" \"GL000195.1\" \"GL000212.1\" \"GL000222.1\" \"GL000200.1\" \"GL000193.1\" \"GL000194.1\" \"GL000225.1\" \"GL000192.1\" \"NC_007605\" \"hs37d5\" \"*\"]"
}
        at com.google.cloud.genomics.utils.Paginator$5$1$1.apply(Paginator.java:1051)
        at com.google.cloud.genomics.utils.Paginator$5$1$1.apply(Paginator.java:1038)
        at com.google.common.base.Present.transform(Present.java:71)
        at com.google.cloud.genomics.utils.Paginator$5$1.computeNext(Paginator.java:1036)
        at com.google.cloud.genomics.utils.Paginator$5$1.computeNext(Paginator.java:1034)
        at com.google.common.collect.AbstractSequentialIterator.next(AbstractSequentialIterator.java:77)
        at com.google.common.collect.Iterators.advance(Iterators.java:909)
        at com.google.common.collect.Iterables$10.iterator(Iterables.java:865)
        at com.google.common.collect.Iterables$8.iterator(Iterables.java:713)
        at com.google.common.collect.Iterables$8.iterator(Iterables.java:713)
        at com.google.common.collect.Iterables.iterators(Iterables.java:508)
        at com.google.common.collect.Iterables.access$100(Iterables.java:60)
        at com.google.common.collect.Iterables$2.iterator(Iterables.java:498)
        at com.google.cloud.genomics.dataflow.readers.ReadReader.processApiCall(ReadReader.java:58)
        at com.google.cloud.genomics.dataflow.readers.ReadReader.processApiCall(ReadReader.java:29)
        at com.google.cloud.genomics.dataflow.readers.GenomicsApiReader.processElement(GenomicsApiReader.java:58)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "The given readGroupSets are not aligned to any reference with referenceName \"1:1000:10000\". Wanted one of [\"1\" \"2\" \"3\" \"4\" \"5\" \"6\" \"7\" \"8\" \"9\" \"10\" \"11\" \"12\" \"13\" \"14\" \"15\" \"16\" \"17\" \"18\" \"19\" \"20\" \"21\" \"22\" \"X\" \"Y\" \"MT\" \"GL000207.1\" \"GL000226.1\" \"GL000229.1\" \"GL000231.1\" \"GL000210.1\" \"GL000239.1\" \"GL000235.1\" \"GL000201.1\" \"GL000247.1\" \"GL000245.1\" \"GL000197.1\" \"GL000203.1\" \"GL000246.1\" \"GL000249.1\" \"GL000196.1\" \"GL000248.1\" \"GL000244.1\" \"GL000238.1\" \"GL000202.1\" \"GL000234.1\" \"GL000232.1\" \"GL000206.1\" \"GL000240.1\" \"GL000236.1\" \"GL000241.1\" \"GL000243.1\" \"GL000242.1\" \"GL000230.1\" \"GL000237.1\" \"GL000233.1\" \"GL000204.1\" \"GL000198.1\" \"GL000208.1\" \"GL000191.1\" \"GL000227.1\" \"GL000228.1\" \"GL000214.1\" \"GL000221.1\" \"GL000209.1\" \"GL000218.1\" \"GL000220.1\" \"GL000213.1\" \"GL000211.1\" \"GL000199.1\" \"GL000217.1\" \"GL000216.1\" \"GL000215.1\" \"GL000205.1\" \"GL000219.1\" \"GL000224.1\" \"GL000223.1\" \"GL000195.1\" \"GL000212.1\" \"GL000222.1\" \"GL000200.1\" \"GL000193.1\" \"GL000194.1\" \"GL000225.1\" \"GL000192.1\" \"NC_007605\" \"hs37d5\" \"*\"]",
    "reason" : "invalidArgument"
  } ],
  "message" : "The given readGroupSets are not aligned to any reference with referenceName \"1:1000:10000\". Wanted one of [\"1\" \"2\" \"3\" \"4\" \"5\" \"6\" \"7\" \"8\" \"9\" \"10\" \"11\" \"12\" \"13\" \"14\" \"15\" \"16\" \"17\" \"18\" \"19\" \"20\" \"21\" \"22\" \"X\" \"Y\" \"MT\" \"GL000207.1\" \"GL000226.1\" \"GL000229.1\" \"GL000231.1\" \"GL000210.1\" \"GL000239.1\" \"GL000235.1\" \"GL000201.1\" \"GL000247.1\" \"GL000245.1\" \"GL000197.1\" \"GL000203.1\" \"GL000246.1\" \"GL000249.1\" \"GL000196.1\" \"GL000248.1\" \"GL000244.1\" \"GL000238.1\" \"GL000202.1\" \"GL000234.1\" \"GL000232.1\" \"GL000206.1\" \"GL000240.1\" \"GL000236.1\" \"GL000241.1\" \"GL000243.1\" \"GL000242.1\" \"GL000230.1\" \"GL000237.1\" \"GL000233.1\" \"GL000204.1\" \"GL000198.1\" \"GL000208.1\" \"GL000191.1\" \"GL000227.1\" \"GL000228.1\" \"GL000214.1\" \"GL000221.1\" \"GL000209.1\" \"GL000218.1\" \"GL000220.1\" \"GL000213.1\" \"GL000211.1\" \"GL000199.1\" \"GL000217.1\" \"GL000216.1\" \"GL000215.1\" \"GL000205.1\" \"GL000219.1\" \"GL000224.1\" \"GL000223.1\" \"GL000195.1\" \"GL000212.1\" \"GL000222.1\" \"GL000200.1\" \"GL000193.1\" \"GL000194.1\" \"GL000225.1\" \"GL000192.1\" \"NC_007605\" \"hs37d5\" \"*\"]"
}
        at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:145)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
        at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:312)
        at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1049)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:410)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:343)
        at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:460)
        at com.google.cloud.genomics.utils.RetryPolicy.execute(RetryPolicy.java:102)
        at com.google.cloud.genomics.utils.Paginator$5$1$1.apply(Paginator.java:1042)
        at com.google.cloud.genomics.utils.Paginator$5$1$1.apply(Paginator.java:1038)
        at com.google.common.base.Present.transform(Present.java:71)
        at com.google.cloud.genomics.utils.Paginator$5$1.computeNext(Paginator.java:1036)
        at com.google.cloud.genomics.utils.Paginator$5$1.computeNext(Paginator.java:1034)
        at com.google.common.collect.AbstractSequentialIterator.next(AbstractSequentialIterator.java:77)
        at com.google.common.collect.Iterators.advance(Iterators.java:909)
        at com.google.common.collect.Iterables$10.iterator(Iterables.java:865)
        at com.google.common.collect.Iterables$8.iterator(Iterables.java:713)
        at com.google.common.collect.Iterables$8.iterator(Iterables.java:713)
        at com.google.common.collect.Iterables.iterators(Iterables.java:508)
        at com.google.common.collect.Iterables.access$100(Iterables.java:60)
        at com.google.common.collect.Iterables$2.iterator(Iterables.java:498)
        at com.google.cloud.genomics.dataflow.readers.ReadReader.processApiCall(ReadReader.java:58)
        at com.google.cloud.genomics.dataflow.readers.ReadReader.processApiCall(ReadReader.java:29)
        at com.google.cloud.genomics.dataflow.readers.GenomicsApiReader.processElement(GenomicsApiReader.java:58)
        at com.google.cloud.dataflow.sdk.util.DoFnRunner.processElement(DoFnRunner.java:126)
        at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateHelper(ParDo.java:1058)
        at com.google.cloud.dataflow.sdk.transforms.ParDo.evaluateSingleHelper(ParDo.java:963)
        at com.google.cloud.dataflow.sdk.transforms.ParDo.access$000(ParDo.java:441)
        at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:951)
        at com.google.cloud.dataflow.sdk.transforms.ParDo$1.evaluate(ParDo.java:946)
        at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:611)
        at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:200)
        at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:196)
        at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:109)
        at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:204)
        at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:584)
        at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:328)
        at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:70)
        at com.google.cloud.dataflow.sdk.Pipeline.run(Pipeline.java:145)
        at com.google.cloud.genomics.dataflow.pipelines.CountReads.main(CountReads.java:127)
$

Thanks,
Paul

@iliat
Copy link
Contributor

iliat commented Mar 13, 2015

@pgrosu The API flavor of this pipeline was written as a very simplistic variant with just taking the reference and not ranges. The main purpose of this was to try the BAM file reading, not so much the API access. I will add more sophistication to the API side handling to make it compatible with what BAM reading part accepts.

@pgrosu
Copy link

pgrosu commented Mar 13, 2015

Aha, thanks Ilia - I'll wait :) It was this part of the script that caused me think otherwise initially:

if [ "$1" = "bam" ]; then
  bam_argument="--BAMFilePath=$BAM_FILE_PATH"
fi
if [ "$2" = "cloud" ]; then
  additional_arguments="--stagingLocation=${STAGING} --numWorkers=1 --runner=BlockingDataflowPipelineRunner"
else
  additional_arguments="--numWorkers=1"
fi

Thanks,
Paul

deflaux added a commit that referenced this pull request Mar 13, 2015
Add Genomics API counters for Dataflow UI display.
@deflaux deflaux merged commit efd9219 into googlegenomics:master Mar 13, 2015
@pgrosu
Copy link

pgrosu commented Mar 13, 2015

Thanks Nicole :)

jiridanek pushed a commit to jiridanek/dataflow-java that referenced this pull request Jan 18, 2016
Add Genomics API counters for Dataflow UI display.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants