Skip to content

Missing method when running data-pipeline/caids/get_caids.py #1620

@ignatiusm

Description

@ignatiusm

I'm trying to run data-pipeline/caids/get_caids.py with a different dataset, but am encountering an issue.

When running lines 139 to 147 of the script (all_part_urls, and completed_part_urls), Hail errors out with the message:

    await f.url() async for f in await fs.listfiles(sharded_vcf_url) if f.name().startswith("part-")
                                                                        ^^^^^^
AttributeError: 'GoogleStorageFileListEntry' object has no attribute 'name'. Did you mean: '_name'?

Looking at the hail source code on line 483 of hail/python/hailtop/aiocloud/aiogoogle/client/storage_client.py I can see that the GoogleStorageFileListEntry class indeed does not have an async name method.

It seems like you were able to run these scripts to create the gnomad_v4 version of CAIDS data set earlier this year. I note this PR where there were updates to the get_caids.py script (and mentions that there have been "a number of Hail utils that have either been changed, removed or replaced since its last update"). I'd be grateful for any suggestions you have for addressing this 😃

I'm using GCP infrastructure with python v3.11 and hail v0.2.132.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions