-
Notifications
You must be signed in to change notification settings - Fork 67
documentation for google symptoms new signals #815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 2 commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
d8f032f
documentation for google symptoms new signals
e98871b
updating signal names to lower case
085ab52
Update docs/api/covidcast-signals/google-symptoms.md
nloliveira a505cb2
Update docs/api/covidcast-signals/google-symptoms.md
nloliveira 4347694
Update docs/api/covidcast-signals/google-symptoms.md
nloliveira 789e8e1
Update docs/api/covidcast-signals/google-symptoms.md
nloliveira da547fd
Update docs/api/covidcast-signals/google-symptoms.md
nloliveira 662b2f2
Update docs/api/covidcast-signals/google-symptoms.md
nloliveira 0d15c83
emphasizing when signals are comparable and when they are not
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,8 +9,8 @@ grand_parent: COVIDcast Epidata API | |
|
||
* **Source name:** `google-symptoms` | ||
* **Earliest issue available:** November 30, 2020 | ||
* **Number of data revisions since May 19, 2020:** 0 | ||
* **Date of last change:** Never | ||
* **Number of data revisions since May 19, 2020:** 1 | ||
* **Date of last change:** January 20, 2022 | ||
* **Available for:** county, MSA, HRR, state, HHS, nation (see [geography coding docs](../covidcast_geography.md)) | ||
* **Time type:** day (see [date format docs](../covidcast_times.md)) | ||
* **License:** To download or use the data, you must agree to the Google [Terms of Service](https://policies.google.com/terms) | ||
|
@@ -19,23 +19,45 @@ grand_parent: COVIDcast Epidata API | |
|
||
This data source is based on the [COVID-19 Search Trends symptoms | ||
dataset](http://goo.gle/covid19symptomdataset). Using | ||
this search data, we estimate the volume of searches mapped to symptoms related | ||
to COVID-19 such as _anosmia_ (lack of smell) and _ageusia_(lack of taste). The | ||
resulting daily dataset for each region shows the relative frequency of searches | ||
for each symptom. The signals are measured in arbitrary units that are | ||
normalized for overall search users in the region and scaled by the maximum value of the normalized | ||
popularity within a geographic region across a specific time range. **Thus, | ||
values are NOT comparable across geographic regions**. Larger numbers represent | ||
increased releative popularity of symptom-related searches. | ||
this search data, we estimate the volume of searches mapped to symptom sets related | ||
to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Thus, values are NOT comparable across geographic regions**. Larger numbers represent increased releative popularity of symptom-related searches. | ||
|
||
#### Symptom sets | ||
|
||
* _s01_: Cough, Phlegm, Sputum, Upper respiratory tract infection | ||
* _s02_: Nasal congestion, Post nasal drip, Rhinorrhea, Sinusitis, Rhinitis,Common cold | ||
nloliveira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* _s03_: Fever, Hyperthermia, Chills, Shivering, Low grade fever | ||
* _s05_: Shortness of breath, Wheeze, Croup, Pneumonia, Asthma, Crackles, Acute bronchitis, Bronchitis | ||
* _s06_: Anosmia, Dysgeusia, Ageusia | ||
* _s8_: Laryngitis, Sore throat,Throat irritation, | ||
nloliveira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* _scontrol_: Type 2 diabetes, Urinary tract infection, Hair loss, Candidiasis, Weight gain | ||
|
||
The symptoms were combined in sets that showed positive correlation with cases, especially after Omicron was declared a variant of concern by the WHO. Note that symptoms in _scontrol_ are not Covid-19 related, and this symptom set can be used as a negative control. | ||
nloliveira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Until January 20, 2022, we had separate signals for symptoms Anosmia and Ageusia, and a signal for their sum. | ||
nloliveira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
| Signal | Description | | ||
| --- | --- | | ||
| `anosmia_raw_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users <br/> **Earliest date available:** 2020-02-13 | | ||
| `anosmia_smoothed_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average <br/> **Earliest date available:** 2020-02-20 | | ||
| `ageusia_raw_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users <br/> **Earliest date available:** 2020-02-13 | | ||
| `ageusia_smoothed_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average <br/> **Earliest date available:** 2020-02-20 | | ||
| `sum_anosmia_ageusia_raw_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users <br/> **Earliest date available:** 2020-02-13 | | ||
| `sum_anosmia_ageusia_smoothed_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average <br/> **Earliest date available:** 2020-02-20 | | ||
| `s01_raw_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 | | ||
| `s01_smoothed_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 | | ||
| `s02_raw_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 | | ||
| `s02_smoothed_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 | | ||
| `s03_raw_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 | | ||
| `s03_smoothed_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 | | ||
| `s05_raw_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 | | ||
| `s05_smoothed_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 | | ||
| `s06_raw_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 | | ||
| `s06_smoothed_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 | | ||
| `s08_raw_search` | The average of Google search volume for related searches of symptom set _s08_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 | | ||
| `s08_smoothed_search` | The average of Google search volume for related searches of symptom set _s08_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 | | ||
| `scontrol_raw_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 | | ||
| `scontrol_smoothed_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 | | ||
| `anosmia_raw_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 | | ||
| `anosmia_smoothed_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-20 | | ||
| `ageusia_raw_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 | | ||
| `ageusia_smoothed_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-20 | | ||
| `sum_anosmia_ageusia_raw_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 | | ||
| `sum_anosmia_ageusia_smoothed_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-20 | | ||
|
||
|
||
## Table of Contents | ||
|
@@ -45,22 +67,24 @@ increased releative popularity of symptom-related searches. | |
{:toc} | ||
|
||
## Estimation | ||
The `sum_anosmia_ageusia_raw_search` signals are simply the raw sum of the | ||
values of `anosmia_raw_search` and `ageusia_raw_search`, but not the union of | ||
anosmia and ageusia related searches. This is because the data volume is | ||
calculated based on search queries. A single search query can be mapped to more | ||
than one symptom. Currently, Google does not provide _intersection/union_ | ||
Each signal is the average of the | ||
values of search trends for each symptom in the symptom set. For example, `s06_raw_search` is the average of the search trend values of anosmia, ageusia, and dysgeusia. Note that this is different from the union of | ||
anosmia, ageusia, and dysgeusia related searches divided by 3, because the data volume for each symptom is calculated based on search queries. A single search query can be mapped to more than one symptom. Currently, Google does not provide _intersection/union_ | ||
data. Users should be careful when considering such signals. | ||
|
||
For each symptom set: when search trends for all symptoms are missing, the signal is be reported missing. When search trends are available for at least one of the symptos, we fill the missing trends for other symptoms with 0 and compute the average. The same approach is used for smoothed signals. A 7 day moving average is used, and missing raw signals are filled with 0 as long as there is at least one day available among the 7 day window. We use this approach because the missing observations in the Google Symptoms search trends dataset are not filled randomly; they represent low popularity and are not reported due to quality and/or privacy reasons. | ||
nloliveira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
|
||
## Geographical Aggregation | ||
The state-level and county-level `raw_search` signals for specific symptoms such | ||
as _anosmia_ and _ageusia_ are taken directly from the [COVID-19 Search Trends | ||
symptoms | ||
dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset) | ||
without changes. | ||
without changes. | ||
nloliveira marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We aggregate county and state data to other geographic levels using | ||
population-weighted averaging. | ||
population-weighted averaging. | ||
|
||
| Source level | Aggregated level | | ||
| ------------ | ---------------- | | ||
|
@@ -80,9 +104,9 @@ Each update will usually extend the coverage to within three days of the day of | |
As a result the delay can range from 3 to 10 days or even more. We check for | ||
updates every day and provide the most up-to-date data. | ||
|
||
## Limitations | ||
## Limitations | ||
When daily volume in a region does not meet quality or privacy thresholds, set | ||
by Google, no daily value is reported. Weekly data may be available from Google | ||
by Google, no daily value is reported. Weekly data may be available from Google | ||
in these cases, but we do not yet support importation using weekly data. | ||
|
||
Google uses differential privacy, which adds artificial noise to the raw | ||
|
@@ -92,14 +116,13 @@ quality of results. | |
Google normalizes and scales time series values to determine the relative | ||
popularity of symptoms in searches within each geographical region individually. | ||
This means that the resulting values of symptom popularity are **NOT** | ||
comparable across geographic regions. | ||
comparable across geographic regions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe mention comparing different signals within a region here again There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
|
||
More details about the limitations of this dataset are available in [Google's Search | ||
More details about the limitations of this dataset are available in [Google's Search | ||
Trends symptoms dataset documentation](https://storage.googleapis.com/gcp-public-data-symptom-search/COVID-19%20Search%20Trends%20symptoms%20dataset%20documentation%20.pdf). | ||
|
||
## Source and Licensing | ||
This dataset is based on Google's [COVID-19 Search Trends symptoms dataset](http://goo.gle/covid19symptomdataset), which is licensed under Google's [Terms of Service](https://policies.google.com/terms). | ||
|
||
To learn more about the source data, how it is generated and its limitations, | ||
To learn more about the source data, how it is generated and its limitations, | ||
read [Google's Search Trends symptoms dataset documentation](https://storage.googleapis.com/gcp-public-data-symptom-search/COVID-19%20Search%20Trends%20symptoms%20dataset%20documentation%20.pdf). | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.