Skip to content

Commit 1f116c6

Browse files
committed
feat(frontend): add ClinVar hail table documentation and downloads
1 parent 74fb1e1 commit 1f116c6

File tree

7 files changed

+306
-0
lines changed

7 files changed

+306
-0
lines changed

browser/help/helpPageTableOfContents.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ const helpPageTableOfContents: { topics: string[]; faq: FaqTopic[] } = {
2020
'hgdp-1kg-annotations',
2121
'v4-hts',
2222
'v4-browser-hts',
23+
'clinvar-hts',
2324
'exome-capture-tech',
2425
'combined-freq-stats',
2526
'allele-count-zero',

browser/help/topics/clinvar-hts.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
---
2+
id: clinvar-hts
3+
title: 'ClinVar Hail Tables'
4+
---
5+
6+
### Overview
7+
8+
We release two data tables underlying the ClinVar data displayed in the gnomAD browser. On a bi-monthly basis, we process the ClinVar monthly VCV XML release into two Hail Tables, one for GRCh38 and one for GRCh37. These tables enable our users to more easily incorporate ClinVar data into external pipelines in a manner consistent with what they see in the browser.
9+
10+
These tables are stored in a requester pays GCS bucket. As such, to access or download this data you must provide a billing project when accessing the data with `gsutil`, e.g.
11+
12+
```
13+
gsutil -u YOUR_PROJECT -m cp -r \
14+
gs://gnomad-browser-clinvar/gnomad_clinvar_grch38.ht \
15+
gs://YOUR-BUCKET/YOUR-OPTIONAL-NESTED-BUCKETS/gnomad_clinvar_grch38.ht
16+
```
17+
18+
<br />
19+
20+
#### ClinVar GRCh38 Hail Table annotations
21+
22+
Global Fields:
23+
24+
- `clinvar_release_date`: Release date of the ClinVar VCV XML file used to generate this Hail Table.
25+
- `mane_select_version`: MANE Select version used to annotate variants.
26+
27+
Row Fields:
28+
29+
- `locus`: Variant locus. Contains contig and position information.
30+
- `alleles`: Variant alleles.
31+
- `clinvar_variation_id`: Unique ClinVar variant ID.
32+
- `rsid`: dbSNP reference SNP identification (rsID) number.
33+
- `review_status`: ClinVar review status for this variant association.
34+
- `gold_stars`: Number of gold stars assigned to this variant association.
35+
- `clinical_significance`: Clinical significance of this variant association.
36+
- `last_evaluated`: Date when this variant association was last evaluated.
37+
- `submissions`: Array containing all association submissions for this variant.
38+
- `id`: Unique ID of this variant association submission.
39+
- `submitter_name`: Group or individual that submitted this variant association submitter.
40+
- `clinical_significance`: Clinical significance of this variant association submission.
41+
- `last_evaluated`: Date when this variant association submission was last evaluated.
42+
- `review_status`: ClinVar review status for this variant association submission.
43+
- `conditions`: An array containing conditions associated with this variant submission.
44+
- `name`: Name of the condition.
45+
- `medgen_id`: MedGen ID of the condition.
46+
- `variant_id`: gnomAD format variant ID.
47+
- `reference_genome`: Reference genome of this variant.
48+
- `chrom`: Chromosome which this variant is in.
49+
- `pos`: Position of this variant in the chromosome.
50+
- `ref`: Reference allele for this variant.
51+
- `alt`: Alternate allele for this variant.
52+
- `transcript_consequences`: Array containing variant transcript consequence information.
53+
- `biotype`: Transcript biotype.
54+
- `consequence_terms`: Array of predicted functional consequences.
55+
- `domains`: Set containing protein domains affected by variant.
56+
- `gene_id`: Unique ID of gene associated with transcript.
57+
- `gene_symbol`: Symbol of gene associated with transcript.
58+
- `hgvsc`: HGVS coding sequence notation for variant.
59+
- `hgvsp`: HGVS protein notation for variant.
60+
- `is_canonical`: Whether transcript is the canonical transcript.
61+
- `lof_filter`: Variant LoF filters (from [LOFTEE](https://github.com/konradjk/loftee)).
62+
- `lof_flags`: LOFTEE flags.
63+
- `lof`: Variant LOFTEE status (high confidence `HC` or low confidence `LC`).
64+
- `major_consequence`: Primary consequence associated with transcript.
65+
- `transcript_id`: Unique transcript ID.
66+
- `transcript_version`: Transcript version.
67+
- `polyphen_prediction`: [Score](https://www.nature.com/articles/nmeth0410-248) that predicts the possible impact of an amino acid substitution on the structure and function of a human protein, ranging from 0.0 (tolerated) to 1.0 (deleterious).
68+
- `sift_prediction`: [Score](https://www.nature.com/articles/nprot.2009.86) reflecting the scaled probability of the amino acid substitution being tolerated, ranging from 0 to 1. Scores below 0.05 are predicted to impact protein function.
69+
- `gene_version`: Gene version.
70+
- `is_mane_select`: Whether transcript is the MANE select transcript.
71+
- `is_mane_select_version`: MANE Select version; has a value if this transcript is the MANE select transcript.
72+
- `refseq_id`: RefSeq ID associated with transcript.
73+
- `refseq_version`: RefSeq version.
74+
- `in_gnomad`: Whether or not this variant is in gnomAD.
75+
- `gnomad`: Struct containing variant information from gnomAD.
76+
- `exome`: Struct containing exome information from gnomAD for this variant.
77+
- `filters`: Set containing variant QC filters. See `filters` description on the v4 Hail Tables [help page](v4-hts#filters).
78+
- `ac`: Allele count for this variant in exomes.
79+
- `an`: Allele number for this variant in exomes.
80+
- `genome`: Struct containing genome information from gnomAD for this variant.
81+
- `filters`: Set containing variant QC filters. See `filters` description on the v4 Hail Tables [help page](v4-hts#filters).
82+
- `ac`: Allele count for this variant in genomes.
83+
- `an`: Allele number for this variant in genomes.
84+
85+
<br />
86+
87+
#### ClinVar GRCh37 Hail Table annotations
88+
89+
This table has a nearly identical schema as the ClinVar GRCh38 table, with exceptions noted below, but uses he GRCh37 as its reference genome.
90+
91+
The GRCh37 table does not include these fields:
92+
93+
Global Fields:
94+
95+
- `mane_select_version`
96+
97+
Row Fields:
98+
99+
- Under the `transcript_consequences` array:
100+
- `is_mane_select`
101+
- `is_mane_select_version`
102+
- `refseq_id`
103+
- `refseq_version`

browser/src/DataPage/GnomadV2Downloads.tsx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -458,6 +458,26 @@ const GnomadV2Downloads = () => {
458458
</FileList>
459459
</DownloadsSection>
460460

461+
<DownloadsSection>
462+
<SectionTitle id="v2-clinvar-grch37">ClinVar</SectionTitle>
463+
<p>
464+
For more information about these files, see the{' '}
465+
<Link to="/help/clinvar-hts">help text</Link>.
466+
</p>
467+
468+
<FileList>
469+
{/* @ts-expect-error TS(2745) FIXME: This JSX tag's 'children' prop expects type 'never... Remove this comment to see the full error message */}
470+
<ListItem>
471+
<GetUrlButtons
472+
gcsBucket="gnomad-browser-clinvar"
473+
label="ClinVar GRCh37 Browser Hail Table"
474+
path="/gnomad_clinvar_grch37.ht"
475+
includeAWS={false}
476+
/>
477+
</ListItem>
478+
</FileList>
479+
</DownloadsSection>
480+
461481
<DownloadsSection>
462482
<SectionTitle id="v2-resources">Resources</SectionTitle>
463483
<FileList>

browser/src/DataPage/GnomadV4Downloads.tsx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -553,6 +553,26 @@ const GnomadV4Downloads = () => {
553553
</FileList>
554554
</DownloadsSection>
555555

556+
<DownloadsSection>
557+
<SectionTitle id="v4-clinvar-grch38">ClinVar</SectionTitle>
558+
<p>
559+
For more information about these files, see the{' '}
560+
<Link to="/help/clinvar-hts">help text</Link>.
561+
</p>
562+
563+
<FileList>
564+
{/* @ts-expect-error TS(2745) FIXME: This JSX tag's 'children' prop expects type 'never... Remove this comment to see the full error message */}
565+
<ListItem>
566+
<GetUrlButtons
567+
gcsBucket="gnomad-browser-clinvar"
568+
label="ClinVar GRCh38 Browser Hail Table"
569+
path="/gnomad_clinvar_grch38.ht"
570+
includeAWS={false}
571+
/>
572+
</ListItem>
573+
</FileList>
574+
</DownloadsSection>
575+
556576
<DownloadsSection>
557577
<SectionTitle id="v4-resources">Resources</SectionTitle>
558578
<FileList>

browser/src/DataPage/__snapshots__/DataPage.spec.tsx.snap

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7077,6 +7077,67 @@ exports[`Data Page has no unexpected changes 1`] = `
70777077
</li>
70787078
</ul>
70797079
</section>
7080+
<section
7081+
className="c16"
7082+
>
7083+
<span
7084+
className="c8"
7085+
>
7086+
<h2
7087+
className="c17"
7088+
>
7089+
<a
7090+
aria-hidden="true"
7091+
className="c10 c11"
7092+
href="#v4-clinvar-grch38"
7093+
id="v4-clinvar-grch38"
7094+
>
7095+
<img
7096+
alt=""
7097+
aria-hidden="true"
7098+
height={12}
7099+
src="test-file-stub"
7100+
width={12}
7101+
/>
7102+
</a>
7103+
ClinVar
7104+
</h2>
7105+
</span>
7106+
<p>
7107+
For more information about these files, see the
7108+
7109+
<a
7110+
className="-Link c14"
7111+
href="/help/clinvar-hts"
7112+
onClick={[Function]}
7113+
>
7114+
help text
7115+
</a>
7116+
.
7117+
</p>
7118+
<ul
7119+
className="c20"
7120+
>
7121+
<li
7122+
className="c21"
7123+
>
7124+
<span>
7125+
ClinVar GRCh38 Browser Hail Table
7126+
</span>
7127+
<br />
7128+
Show URL for
7129+
7130+
<button
7131+
aria-label="Show Google URL for ClinVar GRCh38 Browser Hail Table"
7132+
className="c22"
7133+
onClick={[Function]}
7134+
type="button"
7135+
>
7136+
Google
7137+
</button>
7138+
</li>
7139+
</ul>
7140+
</section>
70807141
<section
70817142
className="c16"
70827143
>
@@ -21127,6 +21188,67 @@ exports[`Data Page has no unexpected changes 1`] = `
2112721188
</li>
2112821189
</ul>
2112921190
</section>
21191+
<section
21192+
className="c16"
21193+
>
21194+
<span
21195+
className="c8"
21196+
>
21197+
<h2
21198+
className="c17"
21199+
>
21200+
<a
21201+
aria-hidden="true"
21202+
className="c10 c11"
21203+
href="#v2-clinvar-grch37"
21204+
id="v2-clinvar-grch37"
21205+
>
21206+
<img
21207+
alt=""
21208+
aria-hidden="true"
21209+
height={12}
21210+
src="test-file-stub"
21211+
width={12}
21212+
/>
21213+
</a>
21214+
ClinVar
21215+
</h2>
21216+
</span>
21217+
<p>
21218+
For more information about these files, see the
21219+
21220+
<a
21221+
className="-Link c14"
21222+
href="/help/clinvar-hts"
21223+
onClick={[Function]}
21224+
>
21225+
help text
21226+
</a>
21227+
.
21228+
</p>
21229+
<ul
21230+
className="c20"
21231+
>
21232+
<li
21233+
className="c21"
21234+
>
21235+
<span>
21236+
ClinVar GRCh37 Browser Hail Table
21237+
</span>
21238+
<br />
21239+
Show URL for
21240+
21241+
<button
21242+
aria-label="Show Google URL for ClinVar GRCh37 Browser Hail Table"
21243+
className="c22"
21244+
onClick={[Function]}
21245+
type="button"
21246+
>
21247+
Google
21248+
</button>
21249+
</li>
21250+
</ul>
21251+
</section>
2113021252
<section
2113121253
className="c16"
2113221254
>

browser/src/help/__snapshots__/HelpPage.spec.tsx.snap

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -771,6 +771,15 @@ exports[`Help Page has no unexpected changes 1`] = `
771771
v4-browser-hts
772772
</a>
773773
</li>
774+
<li>
775+
<a
776+
className="-Link c3"
777+
href="/help/clinvar-hts"
778+
onClick={[Function]}
779+
>
780+
clinvar-hts
781+
</a>
782+
</li>
774783
<li>
775784
<a
776785
className="-Link c3"
@@ -1089,6 +1098,17 @@ exports[`Help Page has no unexpected changes 1`] = `
10891098
v4-browser-hts
10901099
</a>
10911100
</li>
1101+
<li
1102+
className="c10"
1103+
>
1104+
<a
1105+
className="-Link c3"
1106+
href="/help/clinvar-hts"
1107+
onClick={[Function]}
1108+
>
1109+
clinvar-hts
1110+
</a>
1111+
</li>
10921112
<li
10931113
className="c10"
10941114
>

deploy/docs/UpdateClinvarVariants.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,3 +115,23 @@
115115
```
116116
curl -u "elastic:$ELASTICSEARCH_PASSWORD" -XPUT 'http://localhost:9200/_snapshot/backups/%3Csnapshot-%7Bnow%7BYYYY.MM.dd.HH.mm%7D%7D%3E'
117117
```
118+
119+
8. Update the public ClinVar buckets
120+
121+
We release these final hail tables in a requester pays bucket, `gs://gnomad-browser-clinvar`, use `gsutil rsync` to keep the files in sync.
122+
123+
GRCh37
124+
125+
```
126+
gsutil -u gnomadev -m rsync -r \
127+
gs://gnomad-v4-data-pipeline/output/clinvar/clinvar_grch37_annotated_2.ht/ \
128+
gs://gnomad-browser-clinvar/gnomad_clinvar_grch37.ht
129+
```
130+
131+
GRCh38
132+
133+
```
134+
gsutil -u gnomadev -m rsync -r \
135+
gs://gnomad-v4-data-pipeline/output/clinvar/clinvar_grch38_annotated_2.ht/ \
136+
gs://gnomad-browser-clinvar/gnomad_clinvar_grch38.ht
137+
```

0 commit comments

Comments
 (0)