From fa42040770dbf601c2824937c7e487f3689b6952 Mon Sep 17 00:00:00 2001 From: MarinaMartin Date: Wed, 15 May 2013 22:00:01 -0400 Subject: [PATCH 01/23] Clarified "distribution" guidance Addressing Issue #16. --- schema.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index f54ece14..2eca86e8 100644 --- a/schema.md +++ b/schema.md @@ -306,9 +306,23 @@ Field | title **Cardinality** | (0,n) **Required** | No **Accepted Values** | See Usage Notes -**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: download url, format, endpoint, language, size. An example of this this model is: +**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: Download URL, Format, Size. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entry with a comma, as seen below: - "distribution": [`{"accessURL": "http://data.mcc.gov/example_resource/data.json", "format":"JSON", "size":"22mb"}`,`{"accessURL":"http://data.mcc.gov/example_/data.xml", "format":"XML", "size":"24mb"}`] + "distribution": [ + { + "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.csv?accessType=DOWNLOAD", + "format": "csv", + "size": "200MB" + }, + { + "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.json?accessType=DOWNLOAD", + "format": "json" + }, + { + "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.xml?accessType=DOWNLOAD", + "format": "xml" + } + ] **Example** | - From 0f1e7d58710bc26fc3c6686cc8e2d51e06a002a7 Mon Sep 17 00:00:00 2001 From: MarinaMartin Date: Wed, 15 May 2013 23:07:36 -0300 Subject: [PATCH 02/23] Fixed "theme" naming error The Category field guidance incorrectly referred to the metadata name as "category" when it should be "theme." --- schema.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index 2eca86e8..546ade05 100644 --- a/schema.md +++ b/schema.md @@ -283,13 +283,13 @@ Field | title **Example** | `{"dataQuality":"true"}` {.table .table-striped} -**Field** | **category** +**Field** | **theme** ----- | ----- **Cardinality** | (0,n) **Required** | No **Accepted Values** | String **Usage Notes** | Separate multiple categories with a comma. Could include [ISO Topic Categories](http://www.isotopicmaps.org/). -**Example** | `{"category":"vegetables"}` +**Example** | `{"theme":"vegetables"}` {.table .table-striped} **Field** | **references** From 802f0d81fd0fc5a4b6d67764e84c0e60992d6366 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 17:21:18 -0400 Subject: [PATCH 03/23] Mass changes to schema --- schema.md | 225 ++++++++++++++++++++++++------------------------------ 1 file changed, 98 insertions(+), 127 deletions(-) diff --git a/schema.md b/schema.md index 546ade05..3cd1e7d6 100644 --- a/schema.md +++ b/schema.md @@ -1,19 +1,22 @@ --- +published: true layout: default title: Common Core Metadata Schema permalink: /schema/ filename: schema.md id: schema + --- -This section contains guidance to support the use of the [common core metadata](http://project-open-data.github.io/schema/) to list agency datasets and application programming interfaces (APIs) as hosted at agency.gov/data. +This section contains guidance to support the use of the common core metadata to list agency datasets and application programming interfaces (APIs) as hosted at agency.gov/data. + +Updates to the metadata schema can be found in the [changelog](/metadata-changelog). Current metadata version: 1.0 FINAL as of 8/25/13. Standard Metadata Vocabulary ---------------------------- Metadata are selected fields or elements which describe data. The challenge is to define the standard metadata fields and the names of those fields so that the consumer of the data has sufficient information to process and understand the data. The more information that can be conveyed in a standardized regular format, the more valuable data becomes. Metadata can range from basic to advanced, from allowing one to discover the mere fact that a certain data asset exists and is about a general subject all the way to providing detailed semantic information that enables a high degree of machine readability. Making the metadata machine readable greatly increases its openness and utility. -Establishing a common vocabulary is the key to any communication, including communication between machines. [DCAT](http://www.w3.org/TR/vocab-dcat/) is a hierarchical vocabulary specific to datasets that serves as the basis for the **[common core metadata](http://project-open-data.github.io/schema/)** required in this memorandum. The standard consists of a number of schemas (hierarchical vocabulary terms) that represent things that are most often looked for on the web, with [mappings](http://project-open-data.github.io/metadata-resources#common_core_required_fields_equivalents) to their equivalents in other standards. - +Establishing a common vocabulary is the key to any communication, including communication between machines. [DCAT](http://www.w3.org/TR/vocab-dcat/) is a hierarchical vocabulary specific to datasets that serves as the basis for the **common core metadata** required in this memorandum. The standard consists of a number of schemas (hierarchical vocabulary terms) that represent things that are most often looked for on the web, with [mappings](http://project-open-data.github.io/metadata-resources/#common_core_required_fields_equivalents) to their equivalents in other standards. What to Document -- Datasets and APIs ------------------------------------- @@ -25,46 +28,47 @@ The catalog file should list all of an agency's datasets that can be made public Metadata File Format -- JSON --------------------------------------- -The [Implemention Guidance](http://project-open-data.github.io/implementation-guide/) available as a part of Project Open Data describes Agency requirements for the development of metadata as per the Open Data Policy. A quick primer on the file format involved: + +The [Implementation Guidance](/implementation-guide/) available as a part of Project Open Data describes Agency requirements for the development of metadata as per the Open Data Policy. A quick primer on the file format involved: [JSON](http://www.json.org) is a lightweight data-exchange format that is very easy to read, parse and generate. Based on a subset of the JavaScript programming language, JSON is a text format that is optimized for data interchange. JSON is built on two structures: (1) a collection of name/value pairs; and (2) an ordered list of values. -Links to downloadable examples of metadata files developed in this and other formats in [the metadata resources](http://project-open-data.github.io/metadata-resources/). Tools to help agencies produce and maintain their data inventories are [available on GitHub](http://www.github.com/project-open-data) and hosted at [Labs.Data.gov](http://labs.data.gov). +Links to downloadable examples of metadata files developed in this and other formats in [the metadata resources](/metadata-resources/). Tools to help agencies produce and maintain their data inventories are [available on GitHub](http://www.github.com/project-open-data) and hosted at [Labs.Data.gov](http://labs.data.gov). "Common Core" Required Fields ----------------------------- The following "common core" fields are required, to be used to describe each entry: -*(Consult the 'Further Metadata Field Guidance' section lower in the page to learn more about the use of each element, including the range of valid entries where appropriate. Consult the [schema maps](http://project-open-data.github.io/metadata-resources/#common_core_required_fields_equivalents) to find the equivalent Data.gov, RDFa Lite, and CKAN fields.)* +*(Consult the 'Further Metadata Field Guidance' section lower in the page to learn more about the use of each element, including the range of valid entries where appropriate. Consult the [schema maps](/metadata-resources/#common_core_required_fields_equivalents) to find the equivalent Data.gov, RDFa Lite, and CKAN fields.)* -{.table .table-striped} -Field | Definition |JSON +{: .table .table-striped} +Field | Definition |JSON ------- | --------------- | -------------- -Title | Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery. | title -Description | Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. | description -Tags | Tags (or keywords) help users discover your dataset, please include terms that would be used by technical and non-technical users. | keyword -Last Update | Most recent date on which the dataset was changed, updated or modified. | modified -Publisher | The publishing agency. | publisher -Contact Name | Contact person's name for the asset. | person -Contact Email | Contact person's email address. | mbox -Unique Identifier | A unique identifier for the dataset or API as maintained within an Agency catalog or database. | identifier -Public Access Level | The degree to which this dataset **could** be made publicly-available, *regardless of whether it has been made available*. Choices: Public (is or *could be* made publicly available), Restricted (available under certain conditions), or Private (never able to be made publicly available) | accessLevel +Title | Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery. | title +Description | Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. | description +Tags | Tags (or keywords) help users discover your dataset, please include terms that would be used by technical and non-technical users. | keywords +Last Update | Most recent date on which the dataset was changed, updated or modified. | modified +Publisher | The publishing agency. | publisher +Contact Name | Contact person's name for the asset. | person +Contact Email | Contact person's email address. | mbox +Unique Identifier | A unique identifier for the dataset or API as maintained within an Agency catalog or database. | identifier +Public Access Level | The degree to which this dataset **could** be made publicly-available, *regardless of whether it has been made available*. Choices: public (is or *could be* made publicly available), restricted public (available under certain conditions), or non-public (never able to be made publicly available) | accessLevel "Common Core" Required-if-Applicable Fields ------------------------------------------- The following fields must be used to describe each dataset if they are applicable: -{.table .table-striped} -Field | Definition |JSON +{: .table .table-striped} +Field | Definition |JSON ------- | --------------- | -------------- -Data Dictionary | URL to the data dictionary for the dataset or API. Note that documentation other than a Data Dictionary can be referenced using Related Documents as shown in the expanded fields. | dataDictionary -Download URL | URL providing direct access to the downloadable distribution of a dataset. | accessURL -Endpoint | Endpoint of web service to access dataset. | webService -Format | The file format or API type of the distribution. | format -License | The license dataset or API is published with. See [Open Licenses](http://project-open-data.github.io/open-licenses/) for more information. | license -Spatial | The range of spatial applicability of a dataset. Could include a spatial region like a bounding box or a named place. | spatial -Temporal | The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data). | temporal +Access Level Comment | Explanation of how to access a restricted public dataset, or why a private dataset cannot be released. | accessLevelComment +Download URL | URL providing direct access to the downloadable distribution of a dataset. | accessURL +Endpoint | Endpoint of web service to access dataset. | webService +Format | The file format or API type of the distribution. | format +License | The license dataset or API is published with. See [Open Licenses](/open-licenses/) for more information. | license +Spatial | The range of spatial applicability of a dataset. Could include a spatial region like a bounding box or a named place. | spatial +Temporal | The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data). | temporal Beyond Common Core -- Extending the Schema ------------------------------------------ @@ -74,26 +78,24 @@ Expanded Fields --------------- Agencies are encouraged to use the following expanded fields when appropriate. Agencies may freely augment these fields with their own. -{.table .table-striped} -Field | Definition | JSON ------- | ------ | ---- -Release Date | Date of formal issuance. | issued -Frequency | Frequency with which dataset is published. | accrualPeriodicity -Language | The language of the dataset. | language -Granularity | Level of granularity of the dataset. | granularity -Data Quality | Whether the dataset meets the agency's Information Quality Guidelines (true/false). | dataQuality -Category | Main thematic category of the dataset. | theme -Related Documents | Related documents such as technical information about a dataset, developer documentation, etc. | references -Size | The size of the downloadable dataset. | size -Homepage URL | Alternative landing page used to redirect user to a contextual, Agency-hosted "homepage" for the Dataset or API when selecting this resource from the Data.gov user interface. | landingPage -RSS Feed | URL for an RSS feed that provides access to the dataset. | feed -System of Records | URL to the System of Records related to this dataset. | systemOfRecords +{: .table .table-striped} +Field | Definition | JSON +------ | ------ | ---- +Category | Main thematic category of the dataset. | theme +Data Dictionary | URL to the data dictionary for the dataset or API. Note that documentation other than a data dictionary can be referenced using Related Documents as shown in the expanded fields. | dataDictionary +Data Quality | Whether the dataset meets the agency's Information Quality Guidelines (true/false). | dataQuality +Frequency | Frequency with which dataset is published. | accrualPeriodicity +Language | The language of the dataset. | language +Homepage URL | Alternative landing page used to redirect user to a contextual, Agency-hosted "homepage" for the Dataset or API when selecting this resource from the Data.gov user interface. | landingPage +Related Documents | Related documents such as technical information about a dataset, developer documentation, etc. | references +Release Date | Date of formal issuance. | issued +System of Records | If the systems is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice related to this dataset. | systemOfRecords Further Metadata Field Guidance ------------------------------- -{.table .table-striped} +{: .table .table-striped} Field | title ----- | ----- **Cardinality** | (1,1) @@ -102,7 +104,7 @@ Field | title **Usage Notes** | Acronyms should be avoided. **Example** | `{"title":"Types of Vegetables"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **description** ----- | ----- **Cardinality** | (1,1) @@ -111,52 +113,52 @@ Field | title **Usage Notes** | This should be human-readable and understandable to an average person. **Example** | `{"description":"This dataset contains a list of vegetables, including nutrition information and seasonality. Includes details on tomatoes, which are really fruit but considered a vegetable in this dataset."}` -{.table .table-striped} +{: .table .table-striped} **Field** | **dataDictionary** ----- | ----- -**Cardinality** | (1,1) -**Required** | Yes, if there is corresponding data dictionary online. (Documentation that is not specifically a data dictionary belongs in "references") +**Cardinality** | (0,1) +**Required** | No (Documentation that is not specifically a data dictionary belongs in "references") **Accepted Values** | URL **Usage Notes** | - **Example** | `{"dataDictionary":"http://www.agency.gov/vegetables/dictionary.html"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **accessURL** ----- | ----- -**Cardinality** | (1,n) +**Cardinality** | (0,1) **Required** | Yes, if the file is available for public download. **Accepted Values** | URL -**Usage Notes** | This must be the **direct** download URL. Use **homepage** for landing or disambiguation pages, or **dataDictionary** for documentation pages. +**Usage Notes** | This must be the **direct** download URL. Use **homepage** for landing or disambiguation pages, or **references** for documentation pages. For multiple downloads, use **distribution** to include as many **accessURL** entries as you need. **Example** | `{"accessURL":"http://www.agency.gov/vegetables/listofvegetables.csv"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **format** ----- | ----- -**Cardinality** | (1,n) +**Cardinality** | (0,1) **Required** | Yes, if the file is available for public download. **Accepted Values** | String **Usage Notes** | This must describe the exact file available at **accessURL** using file extensions (e.g., CSV, XLS, XSLX, TSV, JSON, XML). For example, if the download file is a ZIP containing a CSV, the entry here is "ZIP". **Example** | `{"format":"csv"}` -{.table .table-striped} -**Field** | **keyword** +{: .table .table-striped} +**Field** | **keywords** ----- | ----- **Cardinality** | (1,n) **Required** | Yes, always -**Accepted Values** | String -**Usage Notes** | Separate keywords with commas. -**Example** | `{"keyword":"squash,vegetables,veggies,greens,leafy,spinach,kale,nutrition,tomatoes,tomatos"}` +**Accepted Values** | Array of strings +**Usage Notes** | Surround each keyword with quotes. Separate keywords with commas. +**Example** | `{"keywords": ["squash","vegetables","veggies","greens","leafy","spinach","kale","nutrition","tomatoes","tomatos"]}` -{.table .table-striped} +{: .table .table-striped} **Field** | **modified** ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always **Accepted Values** | Date (YYYY-MM-DD) -**Usage Notes** | Dates should be formatted as YYYY-MM-DD. Specify "01" as the day if unknown. If this file is brand-new, enter the **issued** date here as well. +**Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. If this file is brand-new, enter the **issued** date here as well. **Example** | `{"modified":"2012-01-15"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **publisher** ----- | ----- **Cardinality** | (1,1) @@ -165,7 +167,7 @@ Field | title **Usage Notes** | Departments and multi-unit agencies may use this field to describe which subordinate agency published this dataset. **Example** | `{"publisher":"U.S. Department of Education"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **person** ----- | ----- **Cardinality** | (1,1) @@ -174,7 +176,7 @@ Field | title **Usage Notes** | Name should be formatted as Last, First **Example** | `{"person":"Brown, John"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **mbox** ----- | ----- **Cardinality** | (1,1) @@ -183,7 +185,7 @@ Field | title **Usage Notes** | - **Example** | `{"mbox":"joe@agency.gov"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **identifier** ----- | ----- **Cardinality** | (1,1) @@ -192,16 +194,16 @@ Field | title **Usage Notes** | This field allows third parties to maintain a consistent record for datasets even if title or URLs are updated. Agencies may integrate an existing system for maintaining unique identifiers or enter arbitrary characters for this field. However, each identifier **must** be unique across the agency's catalog and remain fixed. Characters should be alphanumeric. **Example** | `{"identifier":"1344"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **accessLevel** ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always -**Accepted Values** | Must be one of the following: Public, Restricted, Private -**Usage Notes** | This field refers to degree to which this dataset *could be made available* to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is **public** even if there are no files online. A *restricted* dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A private dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. +**Accepted Values** | Must be one of the following: public, restricted public, private +**Usage Notes** | This field refers to degree to which this dataset *could be made available* to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is **public** even if there are no files online. A *restricted public* dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A *non-public* dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. **Example** | `{"accessLevel":"public"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **webService** ----- | ----- **Cardinality** | (0,1) @@ -210,7 +212,7 @@ Field | title **Usage Notes** | This field will serve to delineate the web services offered by an agency and will be used to aggregate cross-government API catalogs. **Example** | `{"webService":"http://www.agency.gov/vegetables/vegetables.json"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **license** ----- | ----- **Cardinality** | (0,1) @@ -219,25 +221,25 @@ Field | title **Usage Notes** | See list of licenses. **Example** | `{"license":""}` -{.table .table-striped} +{: .table .table-striped} **Field** | **spatial** ----- | ----- **Cardinality** | (0,1) **Required** | Yes, if the dataset is spatial **Accepted Values** | See Usage Notes -**Usage Notes** | This field should contain one of the following types of content: (1) a bounding coordinate box for the dataset represented in latitude / longitude pairs where the coordinates are specified in decimal degrees and in the order of: minimum longitude, minimum latitude, maximum longitude, maximum latitude; (2) a latitude / longitude pair (in decimal degrees) representing a point where the dataset is relevant; (3) a geographic feature expressed in [Geography Markup Language using the Simple Features Profile](http://www.ogcnetwork.net/gml-sf); or (4) a geographic feature from the [GeoNames database](www.geonames.org). +**Usage Notes** | This field should contain one of the following types of content: (1) a bounding coordinate box for the dataset represented in latitude / longitude pairs where the coordinates are specified in decimal degrees and in the order of: minimum longitude, minimum latitude, maximum longitude, maximum latitude; (2) a latitude / longitude pair (in decimal degrees) representing a point where the dataset is relevant; (3) a geographic feature expressed in [Geography Markup Language using the Simple Features Profile](http://www.ogcnetwork.net/gml-sf); or (4) a geographic feature from the [GeoNames database](http://www.geonames.org). **Example** | `{"spatial":"Lincoln, Nebraska"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **temporal** ----- | ----- **Cardinality** | (0,1) **Required** | Yes, if applicable **Accepted Values** | See Usage Notes -**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be formatted as pairs of {start date, end date} in the format YYYY-MM-DD hh:mm:ss using 24 hour clock time notation (e.g., 2011-02-14 12:00:00, 2013-02-14 12:00:00). +**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be formatted as pairs of {start date, end date} in the format YYYY-MM-DD hh:mm:ss using 24 hour clock time notation (e.g., 2011-02-14 12:00:00, 2013-02-14 12:00:00). **Example** | `{"temporal":"2000-01-15 00:45:00,2010-01-15 00:06:00"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **issued** ----- | ----- **Cardinality** | (0,1) @@ -246,73 +248,63 @@ Field | title **Usage Notes** | - **Example** | `{"issued":"2001-01-15"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **accrualPeriodicity** ----- | ----- **Cardinality** | (0,1) **Required** | No -**Accepted Values** | Must be one of the following: hourly, daily, weekly, yearly, other +**Accepted Values** | Must be a value from [DCCDAccrualPeriodicity](http://www.ukoln.ac.uk/metadata/dcmi/collection-DCCDAccrualPeriodicity/): "Annual","Bimonthly","Semiweekly","Daily","Biweekly","Semiannual","Biennial","Triennial","Three times a week","Three times a month","Continuously updated","Monthly","Quarterly","Semimonthly","Three times a year","Weekly","Completely irregular" **Usage Notes** | - **Example** | `{"accrualPeriodicity":"yearly"}` -{.table .table-striped} +{: .table .table-striped} **Field** | **language** ----- | ----- **Cardinality** | (0,n) **Required** | No -**Accepted Values** | String -**Usage Notes** | - -**Example** | `{"language":"English"}` +**Accepted Values** | Array of strings +**Usage Notes** | Must be a value from [RFC 5646](http://tools.ietf.org/html/rfc5646) +**Example** | `{"language":["en","es"]}` -{.table .table-striped} -**Field** | **granularity** ------ | ----- -**Cardinality** | (0,1) -**Required** | No -**Accepted Values** | String -**Usage Notes** | Typically geographical or temporal. -**Example** | `{"granularity":"vegetables"}` - -{.table .table-striped} +{: .table .table-striped} **Field** | **dataQuality** ----- | ----- **Cardinality** | (0,1) **Required** | No -**Accepted Values** | Must be one of the following: true, false -**Usage Notes** | Indicates whether a dataset -**Example** | `{"dataQuality":"true"}` +**Accepted Values** | Must be a boolean value of `true` or `false` (not contained within quote marks) +**Usage Notes** | Indicates whether a dataset conforms to the agency's information quality guidelines. +**Example** | `{"dataQuality":true}` -{.table .table-striped} +{: .table .table-striped} **Field** | **theme** ----- | ----- **Cardinality** | (0,n) **Required** | No -**Accepted Values** | String +**Accepted Values** | Array of strings **Usage Notes** | Separate multiple categories with a comma. Could include [ISO Topic Categories](http://www.isotopicmaps.org/). -**Example** | `{"theme":"vegetables"}` +**Example** | `{"theme":["vegetables","produce"]}` -{.table .table-striped} +{: .table .table-striped} **Field** | **references** ----- | ----- **Cardinality** | (0,n) **Required** | No -**Accepted Values** | URL -**Usage Notes** | Separate multiple URLs with a comma. -**Example** | `{"references":"http://www.agency.gov/fruits/fruits.csv,http://www.agency.gov/legumes/legumes.csv"}` +**Accepted Values** | Array of strings (URLs) +**Usage Notes** | Enclose each URL within strings. Separate multiple URLs with a comma. +**Example** | `{"references":["http://www.agency.gov/fruits/fruits.csv,http://www.agency.gov/legumes/legumes_directions.html",""http://www.agency.gov/fruits/fruits.csv,http://www.agency.gov/fruits/fruits_directions.html""]}` -{.table .table-striped} +{: .table .table-striped} **Field** | **distribution** ----- | ----- **Cardinality** | (0,n) **Required** | No **Accepted Values** | See Usage Notes -**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: Download URL, Format, Size. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entry with a comma, as seen below: +**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: **Download URL** and **Format**. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entry with a comma, as seen below: "distribution": [ { "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.csv?accessType=DOWNLOAD", - "format": "csv", - "size": "200MB" + "format": "csv" }, { "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.json?accessType=DOWNLOAD", @@ -322,20 +314,8 @@ Field | title "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.xml?accessType=DOWNLOAD", "format": "xml" } - ] + ] -**Example** | - - -{.table .table-striped} -**Field** | **size** ------ | ----- -**Cardinality** | (0,n) -**Required** | No -**Accepted Values** | See Usage Notes -**Usage Notes** | Sizes should be formatted as (e.g.), 52kb, 140mb, 2gb. -**Example** | `{"size":"3mb"}` - -{.table .table-striped} **Field** | **landingPage** ----- | ----- **Cardinality** | (0,1) @@ -344,21 +324,12 @@ Field | title **Usage Notes** | This field is not intended for an agency's homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users should be directed to for all resources tied to the dataset. This allows agencies to better specify what a visitor receives after selecting one of the agency's datasets on Data.gov or in third-party mashups. **Example** | `{"landingPage":"http://www.agency.gov/vegetables"}` -{.table .table-striped} -**Field** | **feed** ------ | ----- -**Cardinality** | (0,n) -**Required** | No -**Accepted Values** | URL -**Usage Notes** | These RSS feeds will be used to create a cross-agency RSS feed search tool. -**Example** | `{"feed":"http://www.agency.gov/vegetables/vegetables.rss"}` - Rationale for Metadata Nomenclature ---------------------- We sought to be platform-independent and to align as much as possible with existing open standards. -To that end, our JSON key names are directly drawn from [DCAT](http://www.w3.org/TR/vocab-dcat/), with two exceptions. +To that end, our JSON key names are directly drawn from [DCAT](http://www.w3.org/TR/vocab-dcat/), with two exceptions. We added the new **accessLevel** field to help easily sort datasets into our three existing categories: public, restricted, and private. This field means an agency can run a basic filter against its enterprise data catalog to generate a public-facing list of datasets that are, or *could one day be*, made publicly available (or, in the case of restricted data, available under certain conditions). This field also makes it easy for anyone to generate a list of datasets that *could* be made available but have not yet been released by filtering **accessLevel** to *public* and **accessURL** to *blank*. @@ -373,6 +344,6 @@ Additional Information Examples -------- -* [JSON](http://project-open-data.github.io/metadata-resources/) -* [RDFa Lite](http://project-open-data.github.io/metadata-resources/) -* [XML](http://project-open-data.github.io/metadata-resources/) +* [JSON](/metadata-resources/) +* [RDFa Lite](/metadata-resources/) +* [XML](/metadata-resources/) From 08a6683b7f5fd8b11d6211ce6c8fefd3bf1d987e Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 17:27:31 -0400 Subject: [PATCH 04/23] Updated metadata rationale --- schema.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index 3cd1e7d6..f7631559 100644 --- a/schema.md +++ b/schema.md @@ -329,12 +329,21 @@ Rationale for Metadata Nomenclature ---------------------- We sought to be platform-independent and to align as much as possible with existing open standards. -To that end, our JSON key names are directly drawn from [DCAT](http://www.w3.org/TR/vocab-dcat/), with two exceptions. +To that end, our JSON key names are directly drawn from [DCAT](http://www.w3.org/TR/vocab-dcat/), with a few exceptions. -We added the new **accessLevel** field to help easily sort datasets into our three existing categories: public, restricted, and private. This field means an agency can run a basic filter against its enterprise data catalog to generate a public-facing list of datasets that are, or *could one day be*, made publicly available (or, in the case of restricted data, available under certain conditions). This field also makes it easy for anyone to generate a list of datasets that *could* be made available but have not yet been released by filtering **accessLevel** to *public* and **accessURL** to *blank*. +We added the new **accessLevel** field to help easily sort datasets into our three existing categories: public, restricted public, and non-public. This field means an agency can run a basic filter against its enterprise data catalog to generate a public-facing list of datasets that are, or *could one day be*, made publicly available (or, in the case of restricted data, available under certain conditions). This field also makes it easy for anyone to generate a list of datasets that *could* be made available but have not yet been released by filtering **accessLevel** to *public* and **accessURL** to *blank*. + +We added the new **accessLevelComment** field for data stewards to explain how to access restricted public datasets, and for agencies to have a place to record (even if only internally) the reason for not releasing a non-public dataset. + +We added the new **systemOfRecords** field for data stewards to optionally link to a relevant System of Records Notice URL. A System of Records is a group of any records under the control of any agency from which information is retrieved by the name of the individual or by some identifying number, symbol, or other identifier assigned to the individual. + +We added the new **bureauCode** field to ensure every dataset is connected in a standard way with an agency bureau. + +We added the new **programOffice** field to ensure that when applicable, every dataset is connected in a standard way with an agency program office. With respect to [dcat:dataQuality](http://www.w3.org/TR/vocab-dcat/#property--data-quality), we intentionally did **not** use this field and instead chose a boolean. At the time of this memo's release, DCAT had no specific guidance on the use of this field, and we actually do: whether or not the data meets an agency’s Information Quality Guidelines. + Additional Information ---------------------- * [Schema.org](http://schema.org) From e3891ad59c5ffc4317be22b67887c24954b0ec63 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 17:30:25 -0400 Subject: [PATCH 05/23] Format adheres to MIME types --- schema.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/schema.md b/schema.md index f7631559..81377f6b 100644 --- a/schema.md +++ b/schema.md @@ -55,6 +55,7 @@ Contact Email | Contact person's email address. Unique Identifier | A unique identifier for the dataset or API as maintained within an Agency catalog or database. | identifier Public Access Level | The degree to which this dataset **could** be made publicly-available, *regardless of whether it has been made available*. Choices: public (is or *could be* made publicly available), restricted public (available under certain conditions), or non-public (never able to be made publicly available) | accessLevel + "Common Core" Required-if-Applicable Fields ------------------------------------------- The following fields must be used to describe each dataset if they are applicable: @@ -137,7 +138,7 @@ Field | title **Cardinality** | (0,1) **Required** | Yes, if the file is available for public download. **Accepted Values** | String -**Usage Notes** | This must describe the exact file available at **accessURL** using file extensions (e.g., CSV, XLS, XSLX, TSV, JSON, XML). For example, if the download file is a ZIP containing a CSV, the entry here is "ZIP". +**Usage Notes** | This must adhere to [MIME format types](http://www.iana.org/assignments/media-types) where possible. Describe the exact file available at **accessURL** using file extensions (e.g., CSV, XLS, XSLX, TSV, JSON, XML). For example, if the download file is a ZIP containing a CSV, the entry here is "ZIP". **Example** | `{"format":"csv"}` {: .table .table-striped} From 733d67c3693e56a621a1ac2d6ec56b4a0d8c1334 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 17:35:22 -0400 Subject: [PATCH 06/23] All date fields now ISO 8601 compliant --- schema.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/schema.md b/schema.md index 81377f6b..6d8a77ab 100644 --- a/schema.md +++ b/schema.md @@ -10,7 +10,7 @@ id: schema This section contains guidance to support the use of the common core metadata to list agency datasets and application programming interfaces (APIs) as hosted at agency.gov/data. -Updates to the metadata schema can be found in the [changelog](/metadata-changelog). Current metadata version: 1.0 FINAL as of 8/25/13. +Updates to the metadata schema can be found in the [changelog](/metadata-changelog). Current metadata version: 1.0 FINAL as of 8/26/13. Standard Metadata Vocabulary ---------------------------- @@ -200,7 +200,7 @@ Field | title ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always -**Accepted Values** | Must be one of the following: public, restricted public, private +**Accepted Values** | Must be one of the following: public, restricted public, non-public **Usage Notes** | This field refers to degree to which this dataset *could be made available* to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is **public** even if there are no files online. A *restricted public* dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A *non-public* dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. **Example** | `{"accessLevel":"public"}` @@ -237,16 +237,16 @@ Field | title **Cardinality** | (0,1) **Required** | Yes, if applicable **Accepted Values** | See Usage Notes -**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be formatted as pairs of {start date, end date} in the format YYYY-MM-DD hh:mm:ss using 24 hour clock time notation (e.g., 2011-02-14 12:00:00, 2013-02-14 12:00:00). -**Example** | `{"temporal":"2000-01-15 00:45:00,2010-01-15 00:06:00"}` +**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. Use a solidus (/) to separate times. +**Example** | `{"temporal":"2000-01-15T00:45:00/2010-01-15T00:06:00"}` {: .table .table-striped} **Field** | **issued** ----- | ----- **Cardinality** | (0,1) **Required** | No -**Accepted Values** | Date (YYYY-MM-DD) -**Usage Notes** | - +**Accepted Values** | See Usage Notes +**Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. **Example** | `{"issued":"2001-01-15"}` {: .table .table-striped} From 17960f66372f42b87b0d102fc2469deaf665a760 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 17:36:19 -0400 Subject: [PATCH 07/23] Update distribution cardinality Technically there should never be one more than one distribution array for a dataset. --- schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/schema.md b/schema.md index 6d8a77ab..40b07143 100644 --- a/schema.md +++ b/schema.md @@ -297,7 +297,7 @@ Field | title {: .table .table-striped} **Field** | **distribution** ----- | ----- -**Cardinality** | (0,n) +**Cardinality** | (0,1) **Required** | No **Accepted Values** | See Usage Notes **Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: **Download URL** and **Format**. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entry with a comma, as seen below: From 65b04c01faf8ee35759f8314837fb4648ca9ff61 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 17:37:30 -0400 Subject: [PATCH 08/23] Clarified temporal guidance --- schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/schema.md b/schema.md index 40b07143..6820c125 100644 --- a/schema.md +++ b/schema.md @@ -237,7 +237,7 @@ Field | title **Cardinality** | (0,1) **Required** | Yes, if applicable **Accepted Values** | See Usage Notes -**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. Use a solidus (/) to separate times. +**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. Use a solidus (/) to separate start and end. **Example** | `{"temporal":"2000-01-15T00:45:00/2010-01-15T00:06:00"}` {: .table .table-striped} From fdf0d2bafcaf0fa0e0c663ca340eef5b44ce232e Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 21:30:42 -0400 Subject: [PATCH 09/23] Added Distribution to top table of optional fields, fixed ABC order --- schema.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/schema.md b/schema.md index 6820c125..ff0f03c8 100644 --- a/schema.md +++ b/schema.md @@ -82,12 +82,14 @@ Agencies are encouraged to use the following expanded fields when appropriate. A {: .table .table-striped} Field | Definition | JSON ------ | ------ | ---- + Category | Main thematic category of the dataset. | theme Data Dictionary | URL to the data dictionary for the dataset or API. Note that documentation other than a data dictionary can be referenced using Related Documents as shown in the expanded fields. | dataDictionary Data Quality | Whether the dataset meets the agency's Information Quality Guidelines (true/false). | dataQuality +Distribution | Holds multiple download URLs for datasets composed of multiple files and/or file types | distribution Frequency | Frequency with which dataset is published. | accrualPeriodicity -Language | The language of the dataset. | language Homepage URL | Alternative landing page used to redirect user to a contextual, Agency-hosted "homepage" for the Dataset or API when selecting this resource from the Data.gov user interface. | landingPage +Language | The language of the dataset. | language Related Documents | Related documents such as technical information about a dataset, developer documentation, etc. | references Release Date | Date of formal issuance. | issued System of Records | If the systems is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice related to this dataset. | systemOfRecords From 23b63f66a4e043e2ba29d358bfeafd8e1610372d Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 21:35:35 -0400 Subject: [PATCH 10/23] Clarified that URL values should be strings. --- schema.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/schema.md b/schema.md index ff0f03c8..f2841273 100644 --- a/schema.md +++ b/schema.md @@ -121,7 +121,7 @@ Field | title ----- | ----- **Cardinality** | (0,1) **Required** | No (Documentation that is not specifically a data dictionary belongs in "references") -**Accepted Values** | URL +**Accepted Values** | String (URL) **Usage Notes** | - **Example** | `{"dataDictionary":"http://www.agency.gov/vegetables/dictionary.html"}` @@ -130,7 +130,7 @@ Field | title ----- | ----- **Cardinality** | (0,1) **Required** | Yes, if the file is available for public download. -**Accepted Values** | URL +**Accepted Values** | String (URL) **Usage Notes** | This must be the **direct** download URL. Use **homepage** for landing or disambiguation pages, or **references** for documentation pages. For multiple downloads, use **distribution** to include as many **accessURL** entries as you need. **Example** | `{"accessURL":"http://www.agency.gov/vegetables/listofvegetables.csv"}` @@ -202,7 +202,7 @@ Field | title ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always -**Accepted Values** | Must be one of the following: public, restricted public, non-public +**Accepted Values** | Must be one of the following: "public", "restricted public", "non-public" **Usage Notes** | This field refers to degree to which this dataset *could be made available* to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is **public** even if there are no files online. A *restricted public* dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A *non-public* dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. **Example** | `{"accessLevel":"public"}` @@ -211,7 +211,7 @@ Field | title ----- | ----- **Cardinality** | (0,1) **Required** | Yes, if the dataset has an API -**Accepted Values** | URL +**Accepted Values** | String (URL) **Usage Notes** | This field will serve to delineate the web services offered by an agency and will be used to aggregate cross-government API catalogs. **Example** | `{"webService":"http://www.agency.gov/vegetables/vegetables.json"}` @@ -302,7 +302,7 @@ Field | title **Cardinality** | (0,1) **Required** | No **Accepted Values** | See Usage Notes -**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: **Download URL** and **Format**. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entry with a comma, as seen below: +**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: **accessURL** and **format**. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entries with a comma, as seen below: "distribution": [ { @@ -319,11 +319,12 @@ Field | title } ] +{: .table .table-striped} **Field** | **landingPage** ----- | ----- **Cardinality** | (0,1) **Required** | No -**Accepted Values** | URL +**Accepted Values** | String (URL) **Usage Notes** | This field is not intended for an agency's homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users should be directed to for all resources tied to the dataset. This allows agencies to better specify what a visitor receives after selecting one of the agency's datasets on Data.gov or in third-party mashups. **Example** | `{"landingPage":"http://www.agency.gov/vegetables"}` From 208d145b3e15126c430b25e693462db4bcf479a5 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 21:38:53 -0400 Subject: [PATCH 11/23] Updated format to take MIME Type Closes #117 --- schema.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index f2841273..550ad964 100644 --- a/schema.md +++ b/schema.md @@ -140,8 +140,8 @@ Field | title **Cardinality** | (0,1) **Required** | Yes, if the file is available for public download. **Accepted Values** | String -**Usage Notes** | This must adhere to [MIME format types](http://www.iana.org/assignments/media-types) where possible. Describe the exact file available at **accessURL** using file extensions (e.g., CSV, XLS, XSLX, TSV, JSON, XML). For example, if the download file is a ZIP containing a CSV, the entry here is "ZIP". -**Example** | `{"format":"csv"}` +**Usage Notes** | This must describe the exact files available at **accessURL** using [MIME Types](http://en.wikipedia.org/wiki/Internet_media_type), represented as a list. +**Example** | `{"format": ['application/json'] }` `{"format": ['application/json', 'application/pdf', application/zip'] {: .table .table-striped} **Field** | **keywords** From b9263af24b1e64990e63c8be12acdb776e85d0c5 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 21:44:52 -0400 Subject: [PATCH 12/23] Clarified temporal usage notes. Taken from #116. Thanks Sean! --- schema.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index 550ad964..c1e46642 100644 --- a/schema.md +++ b/schema.md @@ -239,8 +239,8 @@ Field | title **Cardinality** | (0,1) **Required** | Yes, if applicable **Accepted Values** | See Usage Notes -**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. Use a solidus (/) to separate start and end. -**Example** | `{"temporal":"2000-01-15T00:45:00/2010-01-15T00:06:00"}` +**Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be formatted as pairs of {start datetime/end datetime} in the [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. ISO 8601 specifies that datetimes can be formatted in a number of ways, including a simple four-digit year (eg. 2013) to a much more specific YYYY-MM-DDTHH:MM:SSZ, where the T specifies a seperator between the date and time and time is expressed in 24 hour notation in the UTC (Zulu) time zone. (e.g., 2011-02-14T12:00:00Z/2013-07-04T19:34:00Z). Use a solidus ("/") to separate start and end times. +**Example** | `{"temporal":"2000-01-15T00:45:00Z/2010-01-15T00:06:00Z"}` {: .table .table-striped} **Field** | **issued** From fe8774b17e899c9f59a2eb67404ac8310c33872e Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 21:46:03 -0400 Subject: [PATCH 13/23] Fixed accrualPeriodicity example Thanks Sean! Taken from #116 --- schema.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/schema.md b/schema.md index c1e46642..a9748b50 100644 --- a/schema.md +++ b/schema.md @@ -258,7 +258,7 @@ Field | title **Required** | No **Accepted Values** | Must be a value from [DCCDAccrualPeriodicity](http://www.ukoln.ac.uk/metadata/dcmi/collection-DCCDAccrualPeriodicity/): "Annual","Bimonthly","Semiweekly","Daily","Biweekly","Semiannual","Biennial","Triennial","Three times a week","Three times a month","Continuously updated","Monthly","Quarterly","Semimonthly","Three times a year","Weekly","Completely irregular" **Usage Notes** | - -**Example** | `{"accrualPeriodicity":"yearly"}` +**Example** | `{"accrualPeriodicity":"annual"}` {: .table .table-striped} **Field** | **language** From eb44f4fe1c8817155af6d03ec05df89bbcb5c2ce Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 21:52:43 -0400 Subject: [PATCH 14/23] Updated language guidance Thanks Sean! This incorporates #100 --- schema.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index a9748b50..627e1bed 100644 --- a/schema.md +++ b/schema.md @@ -266,8 +266,8 @@ Field | title **Cardinality** | (0,n) **Required** | No **Accepted Values** | Array of strings -**Usage Notes** | Must be a value from [RFC 5646](http://tools.ietf.org/html/rfc5646) -**Example** | `{"language":["en","es"]}` +**Usage Notes** | This should adhere to the [RFC 5646](http://tools.ietf.org/html/rfc5646) standard. http://rishida.net/utils/subtags/ provides a good tool for checking and verifying language codes. A language tag is comprised of either one or two parts, the language subtag (such as en for English, sp for Spanish, wo for Wolof) and the regional subtag (such as US for United States, GB for Great Britain, MX for Mexico), separated by a hyphen. Regional subtags should only be provided when needed to distinguish a language tag from another one (such as American vs. British English). +**Examples** | `{"language":"en-US"}` `{"language":"en-GB"}` `{"language":"jp"}` `{"language":"es-MX, wo, nv, en-US"}` {: .table .table-striped} **Field** | **dataQuality** From 5f7b3e9cd437c6ddb8ed72c85e4294f96f9655d0 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 22:00:39 -0400 Subject: [PATCH 15/23] Added PrimaryITInvestmentUII Closes #91 --- schema.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/schema.md b/schema.md index 627e1bed..e8ce642d 100644 --- a/schema.md +++ b/schema.md @@ -93,6 +93,7 @@ Language | The language of the dataset. Related Documents | Related documents such as technical information about a dataset, developer documentation, etc. | references Release Date | Date of formal issuance. | issued System of Records | If the systems is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice related to this dataset. | systemOfRecords +Primary IT Investment UII | For linking a dataset with an IT Unique Investment Identifier (UII) | PrimaryITInvestmentUII Further Metadata Field Guidance @@ -328,6 +329,15 @@ Field | title **Usage Notes** | This field is not intended for an agency's homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users should be directed to for all resources tied to the dataset. This allows agencies to better specify what a visitor receives after selecting one of the agency's datasets on Data.gov or in third-party mashups. **Example** | `{"landingPage":"http://www.agency.gov/vegetables"}` +{: .table .table-striped} +**Field** | **PrimaryITInvestmentUII** +----- | ----- +**Cardinality** | (0,1) +**Required** | No +**Accepted Values** | String +**Usage Notes** | Use to link a given dataset with its related IT Unique Investment Identifier. +**Example** | `{"PrimaryITInvestmentUII":"123456"}` + Rationale for Metadata Nomenclature ---------------------- From 1403ab018bde4a9134b37baafb88bc9aa15544ea Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 22:20:48 -0400 Subject: [PATCH 16/23] Clarified date for issued/modified This addresses #65. --- schema.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/schema.md b/schema.md index e8ce642d..16150bd8 100644 --- a/schema.md +++ b/schema.md @@ -158,13 +158,13 @@ Field | title ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always -**Accepted Values** | Date (YYYY-MM-DD) +**Accepted Values** | ISO 8601 Date **Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. If this file is brand-new, enter the **issued** date here as well. **Example** | `{"modified":"2012-01-15"}` {: .table .table-striped} **Field** | **publisher** ------ | ----- +----- | -----is **Cardinality** | (1,1) **Required** | Yes, always **Accepted Values** | String @@ -248,7 +248,7 @@ Field | title ----- | ----- **Cardinality** | (0,1) **Required** | No -**Accepted Values** | See Usage Notes +**Accepted Values** | ISO 8601 Date **Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. **Example** | `{"issued":"2001-01-15"}` From bd6a8bb37d12f0825d5ed193b7deea11f1e9475f Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 23:12:30 -0400 Subject: [PATCH 17/23] Alphabetized guidance section --- schema.md | 271 +++++++++++++++++++++++++++--------------------------- 1 file changed, 136 insertions(+), 135 deletions(-) diff --git a/schema.md b/schema.md index 16150bd8..c202915a 100644 --- a/schema.md +++ b/schema.md @@ -90,32 +90,41 @@ Distribution | Holds multiple download URLs for datasets composed of mult Frequency | Frequency with which dataset is published. | accrualPeriodicity Homepage URL | Alternative landing page used to redirect user to a contextual, Agency-hosted "homepage" for the Dataset or API when selecting this resource from the Data.gov user interface. | landingPage Language | The language of the dataset. | language +Primary IT Investment UII | For linking a dataset with an IT Unique Investment Identifier (UII) | PrimaryITInvestmentUII Related Documents | Related documents such as technical information about a dataset, developer documentation, etc. | references Release Date | Date of formal issuance. | issued System of Records | If the systems is designated as a system of records under the Privacy Act of 1974, provide the URL to the System of Records Notice related to this dataset. | systemOfRecords -Primary IT Investment UII | For linking a dataset with an IT Unique Investment Identifier (UII) | PrimaryITInvestmentUII Further Metadata Field Guidance ------------------------------- {: .table .table-striped} -Field | title ------ | ----- +**Field** | **accessLevel** +----- | ----- **Cardinality** | (1,1) -**Required** | Yes, always -**Accepted Values** | String -**Usage Notes** | Acronyms should be avoided. -**Example** | `{"title":"Types of Vegetables"}` +**Required** | Yes, always +**Accepted Values** | Must be one of the following: "public", "restricted public", "non-public" +**Usage Notes** | This field refers to degree to which this dataset *could be made available* to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is **public** even if there are no files online. A *restricted public* dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A *non-public* dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. +**Example** | `{"accessLevel":"public"}` {: .table .table-striped} -**Field** | **description** +**Field** | **accessURL** ----- | ----- -**Cardinality** | (1,1) -**Required** | Yes, always -**Accepted Values** | String -**Usage Notes** | This should be human-readable and understandable to an average person. -**Example** | `{"description":"This dataset contains a list of vegetables, including nutrition information and seasonality. Includes details on tomatoes, which are really fruit but considered a vegetable in this dataset."}` +**Cardinality** | (0,1) +**Required** | Yes, if the file is available for public download. +**Accepted Values** | String (URL) +**Usage Notes** | This must be the **direct** download URL. Use **homepage** for landing or disambiguation pages, or **references** for documentation pages. For multiple downloads, use **distribution** to include as many **accessURL** entries as you need. +**Example** | `{"accessURL":"http://www.agency.gov/vegetables/listofvegetables.csv"}` + +{: .table .table-striped} +**Field** | **accrualPeriodicity** +----- | ----- +**Cardinality** | (0,1) +**Required** | No +**Accepted Values** | Must be a value from [DCCDAccrualPeriodicity](http://www.ukoln.ac.uk/metadata/dcmi/collection-DCCDAccrualPeriodicity/): "Annual","Bimonthly","Semiweekly","Daily","Biweekly","Semiannual","Biennial","Triennial","Three times a week","Three times a month","Continuously updated","Monthly","Quarterly","Semimonthly","Three times a year","Weekly","Completely irregular" +**Usage Notes** | - +**Example** | `{"accrualPeriodicity":"annual"}` {: .table .table-striped} **Field** | **dataDictionary** @@ -127,14 +136,46 @@ Field | title **Example** | `{"dataDictionary":"http://www.agency.gov/vegetables/dictionary.html"}` {: .table .table-striped} -**Field** | **accessURL** +**Field** | **dataQuality** ----- | ----- **Cardinality** | (0,1) -**Required** | Yes, if the file is available for public download. -**Accepted Values** | String (URL) -**Usage Notes** | This must be the **direct** download URL. Use **homepage** for landing or disambiguation pages, or **references** for documentation pages. For multiple downloads, use **distribution** to include as many **accessURL** entries as you need. -**Example** | `{"accessURL":"http://www.agency.gov/vegetables/listofvegetables.csv"}` +**Required** | No +**Accepted Values** | Must be a boolean value of `true` or `false` (not contained within quote marks) +**Usage Notes** | Indicates whether a dataset conforms to the agency's information quality guidelines. +**Example** | `{"dataQuality":true}` +{: .table .table-striped} +**Field** | **description** +----- | ----- +**Cardinality** | (1,1) +**Required** | Yes, always +**Accepted Values** | String +**Usage Notes** | This should be human-readable and understandable to an average person. +**Example** | `{"description":"This dataset contains a list of vegetables, including nutrition information and seasonality. Includes details on tomatoes, which are really fruit but considered a vegetable in this dataset."}` + +{: .table .table-striped} +**Field** | **distribution** +----- | ----- +**Cardinality** | (0,1) +**Required** | No +**Accepted Values** | See Usage Notes +**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: **accessURL** and **format**. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entries with a comma, as seen below: + + "distribution": [ + { + "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.csv?accessType=DOWNLOAD", + "format": "csv" + }, + { + "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.json?accessType=DOWNLOAD", + "format": "json" + }, + { + "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.xml?accessType=DOWNLOAD", + "format": "xml" + } + ] + {: .table .table-striped} **Field** | **format** ----- | ----- @@ -142,7 +183,25 @@ Field | title **Required** | Yes, if the file is available for public download. **Accepted Values** | String **Usage Notes** | This must describe the exact files available at **accessURL** using [MIME Types](http://en.wikipedia.org/wiki/Internet_media_type), represented as a list. -**Example** | `{"format": ['application/json'] }` `{"format": ['application/json', 'application/pdf', application/zip'] +**Example** | `{"format": ['application/json'] }` `{"format": ['application/json', 'application/pdf', application/zip']} + +{: .table .table-striped} +**Field** | **identifier** +----- | ----- +**Cardinality** | (1,1) +**Required** | Yes, always +**Accepted Values** | String +**Usage Notes** | This field allows third parties to maintain a consistent record for datasets even if title or URLs are updated. Agencies may integrate an existing system for maintaining unique identifiers or enter arbitrary characters for this field. However, each identifier **must** be unique across the agency's catalog and remain fixed. Characters should be alphanumeric. +**Example** | `{"identifier":"1344"}` + +{: .table .table-striped} +**Field** | **issued** +----- | ----- +**Cardinality** | (0,1) +**Required** | No +**Accepted Values** | ISO 8601 Date +**Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. +**Example** | `{"issued":"2001-01-15"}` {: .table .table-striped} **Field** | **keywords** @@ -154,31 +213,31 @@ Field | title **Example** | `{"keywords": ["squash","vegetables","veggies","greens","leafy","spinach","kale","nutrition","tomatoes","tomatos"]}` {: .table .table-striped} -**Field** | **modified** +**Field** | **landingPage** ----- | ----- -**Cardinality** | (1,1) -**Required** | Yes, always -**Accepted Values** | ISO 8601 Date -**Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. If this file is brand-new, enter the **issued** date here as well. -**Example** | `{"modified":"2012-01-15"}` +**Cardinality** | (0,1) +**Required** | No +**Accepted Values** | String (URL) +**Usage Notes** | This field is not intended for an agency's homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users should be directed to for all resources tied to the dataset. This allows agencies to better specify what a visitor receives after selecting one of the agency's datasets on Data.gov or in third-party mashups. +**Example** | `{"landingPage":"http://www.agency.gov/vegetables"}` {: .table .table-striped} -**Field** | **publisher** ------ | -----is -**Cardinality** | (1,1) -**Required** | Yes, always -**Accepted Values** | String -**Usage Notes** | Departments and multi-unit agencies may use this field to describe which subordinate agency published this dataset. -**Example** | `{"publisher":"U.S. Department of Education"}` +**Field** | **language** +----- | ----- +**Cardinality** | (0,n) +**Required** | No +**Accepted Values** | Array of strings +**Usage Notes** | This should adhere to the [RFC 5646](http://tools.ietf.org/html/rfc5646) standard. http://rishida.net/utils/subtags/ provides a good tool for checking and verifying language codes. A language tag is comprised of either one or two parts, the language subtag (such as en for English, sp for Spanish, wo for Wolof) and the regional subtag (such as US for United States, GB for Great Britain, MX for Mexico), separated by a hyphen. Regional subtags should only be provided when needed to distinguish a language tag from another one (such as American vs. British English). +**Examples** | `{"language":"en-US"}` `{"language":"en-GB"}` `{"language":"jp"}` `{"language":"es-MX, wo, nv, en-US"}` {: .table .table-striped} -**Field** | **person** +**Field** | **license** ----- | ----- -**Cardinality** | (1,1) -**Required** | Yes, always -**Accepted Values** | String -**Usage Notes** | Name should be formatted as Last, First -**Example** | `{"person":"Brown, John"}` +**Cardinality** | (0,1) +**Required** | No +**Accepted Values** | - +**Usage Notes** | See list of licenses. +**Example** | `{"license":""}` {: .table .table-striped} **Field** | **mbox** @@ -190,40 +249,50 @@ Field | title **Example** | `{"mbox":"joe@agency.gov"}` {: .table .table-striped} -**Field** | **identifier** +**Field** | **modified** ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always -**Accepted Values** | String -**Usage Notes** | This field allows third parties to maintain a consistent record for datasets even if title or URLs are updated. Agencies may integrate an existing system for maintaining unique identifiers or enter arbitrary characters for this field. However, each identifier **must** be unique across the agency's catalog and remain fixed. Characters should be alphanumeric. -**Example** | `{"identifier":"1344"}` +**Accepted Values** | ISO 8601 Date +**Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. If this file is brand-new, enter the **issued** date here as well. +**Example** | `{"modified":"2012-01-15"}` {: .table .table-striped} -**Field** | **accessLevel** +**Field** | **person** ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always -**Accepted Values** | Must be one of the following: "public", "restricted public", "non-public" -**Usage Notes** | This field refers to degree to which this dataset *could be made available* to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is **public** even if there are no files online. A *restricted public* dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A *non-public* dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. -**Example** | `{"accessLevel":"public"}` +**Accepted Values** | String +**Usage Notes** | Name should be formatted as Last, First +**Example** | `{"person":"Brown, John"}` {: .table .table-striped} -**Field** | **webService** +**Field** | **PrimaryITInvestmentUII** ----- | ----- **Cardinality** | (0,1) -**Required** | Yes, if the dataset has an API -**Accepted Values** | String (URL) -**Usage Notes** | This field will serve to delineate the web services offered by an agency and will be used to aggregate cross-government API catalogs. -**Example** | `{"webService":"http://www.agency.gov/vegetables/vegetables.json"}` +**Required** | No +**Accepted Values** | String +**Usage Notes** | Use to link a given dataset with its related IT Unique Investment Identifier. +**Example** | `{"PrimaryITInvestmentUII":"123456"}` {: .table .table-striped} -**Field** | **license** +**Field** | **publisher** +----- | -----is +**Cardinality** | (1,1) +**Required** | Yes, always +**Accepted Values** | String +**Usage Notes** | Departments and multi-unit agencies may use this field to describe which subordinate agency published this dataset. +**Example** | `{"publisher":"U.S. Department of Education"}` + +{: .table .table-striped} +**Field** | **references** ----- | ----- -**Cardinality** | (0,1) +**Cardinality** | (0,n) **Required** | No -**Accepted Values** | - -**Usage Notes** | See list of licenses. -**Example** | `{"license":""}` +**Accepted Values** | Array of strings (URLs) +**Usage Notes** | Enclose each URL within strings. Separate multiple URLs with a comma. +**Example** | `{"references":["http://www.agency.gov/fruits/fruits.csv,http://www.agency.gov/legumes/legumes_directions.html",""http://www.agency.gov/fruits/fruits.csv,http://www.agency.gov/fruits/fruits_directions.html""]}` + {: .table .table-striped} **Field** | **spatial** @@ -243,42 +312,6 @@ Field | title **Usage Notes** | This field should contain an interval of time defined by start and end dates. Dates should be formatted as pairs of {start datetime/end datetime} in the [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) format. ISO 8601 specifies that datetimes can be formatted in a number of ways, including a simple four-digit year (eg. 2013) to a much more specific YYYY-MM-DDTHH:MM:SSZ, where the T specifies a seperator between the date and time and time is expressed in 24 hour notation in the UTC (Zulu) time zone. (e.g., 2011-02-14T12:00:00Z/2013-07-04T19:34:00Z). Use a solidus ("/") to separate start and end times. **Example** | `{"temporal":"2000-01-15T00:45:00Z/2010-01-15T00:06:00Z"}` -{: .table .table-striped} -**Field** | **issued** ------ | ----- -**Cardinality** | (0,1) -**Required** | No -**Accepted Values** | ISO 8601 Date -**Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. -**Example** | `{"issued":"2001-01-15"}` - -{: .table .table-striped} -**Field** | **accrualPeriodicity** ------ | ----- -**Cardinality** | (0,1) -**Required** | No -**Accepted Values** | Must be a value from [DCCDAccrualPeriodicity](http://www.ukoln.ac.uk/metadata/dcmi/collection-DCCDAccrualPeriodicity/): "Annual","Bimonthly","Semiweekly","Daily","Biweekly","Semiannual","Biennial","Triennial","Three times a week","Three times a month","Continuously updated","Monthly","Quarterly","Semimonthly","Three times a year","Weekly","Completely irregular" -**Usage Notes** | - -**Example** | `{"accrualPeriodicity":"annual"}` - -{: .table .table-striped} -**Field** | **language** ------ | ----- -**Cardinality** | (0,n) -**Required** | No -**Accepted Values** | Array of strings -**Usage Notes** | This should adhere to the [RFC 5646](http://tools.ietf.org/html/rfc5646) standard. http://rishida.net/utils/subtags/ provides a good tool for checking and verifying language codes. A language tag is comprised of either one or two parts, the language subtag (such as en for English, sp for Spanish, wo for Wolof) and the regional subtag (such as US for United States, GB for Great Britain, MX for Mexico), separated by a hyphen. Regional subtags should only be provided when needed to distinguish a language tag from another one (such as American vs. British English). -**Examples** | `{"language":"en-US"}` `{"language":"en-GB"}` `{"language":"jp"}` `{"language":"es-MX, wo, nv, en-US"}` - -{: .table .table-striped} -**Field** | **dataQuality** ------ | ----- -**Cardinality** | (0,1) -**Required** | No -**Accepted Values** | Must be a boolean value of `true` or `false` (not contained within quote marks) -**Usage Notes** | Indicates whether a dataset conforms to the agency's information quality guidelines. -**Example** | `{"dataQuality":true}` - {: .table .table-striped} **Field** | **theme** ----- | ----- @@ -289,54 +322,22 @@ Field | title **Example** | `{"theme":["vegetables","produce"]}` {: .table .table-striped} -**Field** | **references** ------ | ----- -**Cardinality** | (0,n) -**Required** | No -**Accepted Values** | Array of strings (URLs) -**Usage Notes** | Enclose each URL within strings. Separate multiple URLs with a comma. -**Example** | `{"references":["http://www.agency.gov/fruits/fruits.csv,http://www.agency.gov/legumes/legumes_directions.html",""http://www.agency.gov/fruits/fruits.csv,http://www.agency.gov/fruits/fruits_directions.html""]}` - -{: .table .table-striped} -**Field** | **distribution** ------ | ----- -**Cardinality** | (0,1) -**Required** | No -**Accepted Values** | See Usage Notes -**Usage Notes** | Distribution is a concatenation, as appropriate, of the following elements: **accessURL** and **format**. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entries with a comma, as seen below: - - "distribution": [ - { - "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.csv?accessType=DOWNLOAD", - "format": "csv" - }, - { - "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.json?accessType=DOWNLOAD", - "format": "json" - }, - { - "accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.xml?accessType=DOWNLOAD", - "format": "xml" - } - ] +Field | title +----- | ----- +**Cardinality** | (1,1) +**Required** | Yes, always +**Accepted Values** | String +**Usage Notes** | Acronyms should be avoided. +**Example** | `{"title":"Types of Vegetables"}` {: .table .table-striped} -**Field** | **landingPage** +**Field** | **webService** ----- | ----- **Cardinality** | (0,1) -**Required** | No +**Required** | Yes, if the dataset has an API **Accepted Values** | String (URL) -**Usage Notes** | This field is not intended for an agency's homepage (e.g. www.agency.gov), but rather if a dataset has a human-friendly hub or landing page that users should be directed to for all resources tied to the dataset. This allows agencies to better specify what a visitor receives after selecting one of the agency's datasets on Data.gov or in third-party mashups. -**Example** | `{"landingPage":"http://www.agency.gov/vegetables"}` - -{: .table .table-striped} -**Field** | **PrimaryITInvestmentUII** ------ | ----- -**Cardinality** | (0,1) -**Required** | No -**Accepted Values** | String -**Usage Notes** | Use to link a given dataset with its related IT Unique Investment Identifier. -**Example** | `{"PrimaryITInvestmentUII":"123456"}` +**Usage Notes** | This field will serve to delineate the web services offered by an agency and will be used to aggregate cross-government API catalogs. +**Example** | `{"webService":"http://www.agency.gov/vegetables/vegetables.json"}` Rationale for Metadata Nomenclature From bc0797055587c6b97655ab4c5591f518da9a0776 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 23:19:41 -0400 Subject: [PATCH 18/23] Added bureauCode --- schema.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index c202915a..0cb3a2e0 100644 --- a/schema.md +++ b/schema.md @@ -48,6 +48,7 @@ Field | Definition Title | Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery. | title Description | Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. | description Tags | Tags (or keywords) help users discover your dataset, please include terms that would be used by technical and non-technical users. | keywords +Bureau Code | Combined agency and bureau code from [OMB A-11, Appendix C](http://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/app_c.pdf) | bureauCode Last Update | Most recent date on which the dataset was changed, updated or modified. | modified Publisher | The publishing agency. | publisher Contact Name | Contact person's name for the asset. | person @@ -122,10 +123,20 @@ Further Metadata Field Guidance ----- | ----- **Cardinality** | (0,1) **Required** | No -**Accepted Values** | Must be a value from [DCCDAccrualPeriodicity](http://www.ukoln.ac.uk/metadata/dcmi/collection-DCCDAccrualPeriodicity/): "Annual","Bimonthly","Semiweekly","Daily","Biweekly","Semiannual","Biennial","Triennial","Three times a week","Three times a month","Continuously updated","Monthly","Quarterly","Semimonthly","Three times a year","Weekly","Completely irregular" -**Usage Notes** | - +**Accepted Values** | See usage notes +**Usage Notes** | Must be a value from [DCCDAccrualPeriodicity](http://www.ukoln.ac.uk/metadata/dcmi/collection-DCCDAccrualPeriodicity/): "Annual","Bimonthly","Semiweekly","Daily","Biweekly","Semiannual","Biennial","Triennial","Three times a week","Three times a month","Continuously updated","Monthly","Quarterly","Semimonthly","Three times a year","Weekly","Completely irregular" **Example** | `{"accrualPeriodicity":"annual"}` +{: .table .table-striped} +**Field** | **bureauCode** +----- | ----- +**Cardinality** | (1,n) +**Required** | Yes, always +**Accepted Values** | Array of strings +**Usage Notes** | Represent each bureau responsible for the dataset according to the codes found in [OMB Circular A-11, Appendix C](http://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/app_c.pdf). Start with the agency code, then a colon, then the bureau code. +**Example** | The Office of the Solicitor (86) at the Department of the Interior (010) would be: `{"bureauCode":["010:86"]}` + + {: .table .table-striped} **Field** | **dataDictionary** ----- | ----- From 12244ea0a84b7c9468f6a2e62ab1a82848ee2033 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sat, 24 Aug 2013 23:27:26 -0400 Subject: [PATCH 19/23] Added programOffice --- schema.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/schema.md b/schema.md index 0cb3a2e0..a34207f0 100644 --- a/schema.md +++ b/schema.md @@ -69,6 +69,7 @@ Download URL | URL providing direct access to the downloadable distributi Endpoint | Endpoint of web service to access dataset. | webService Format | The file format or API type of the distribution. | format License | The license dataset or API is published with. See [Open Licenses](/open-licenses/) for more information. | license +Program Office | The program office responsible for the dataset | programOffice Spatial | The range of spatial applicability of a dataset. Could include a spatial region like a bounding box or a named place. | spatial Temporal | The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data). | temporal @@ -286,6 +287,15 @@ Further Metadata Field Guidance **Usage Notes** | Use to link a given dataset with its related IT Unique Investment Identifier. **Example** | `{"PrimaryITInvestmentUII":"123456"}` +{: .table .table-striped} +**Field** | **programOffice** +----- | ----- +**Cardinality** | (0,n) +**Required** | Yes, if a program office owns or co-owns this dataset. +**Accepted Values** | Array of strings +**Usage Notes** | Enter the name of the program office responsible for the dataset, as found in the [Federal Program Inventory](http://goals.performance.gov/federalprograminventory). +**Example** | `{"programOffice":["2.31. Survivors’ and Dependents’ Educational Assistance"]}` + {: .table .table-striped} **Field** | **publisher** ----- | -----is @@ -333,7 +343,7 @@ Further Metadata Field Guidance **Example** | `{"theme":["vegetables","produce"]}` {: .table .table-striped} -Field | title +**Field** | **title** ----- | ----- **Cardinality** | (1,1) **Required** | Yes, always From 974a0d4989f3d1b97f6cf5916d219f68682cce80 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sun, 25 Aug 2013 13:28:24 -0400 Subject: [PATCH 20/23] Renamed person to contactPoint We may as well go all-in on DCAT alignment while making other changes. This closes #133. --- schema.md | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/schema.md b/schema.md index a34207f0..3cd14e83 100644 --- a/schema.md +++ b/schema.md @@ -51,7 +51,7 @@ Tags | Tags (or keywords) help users discover your dataset, pleas Bureau Code | Combined agency and bureau code from [OMB A-11, Appendix C](http://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/app_c.pdf) | bureauCode Last Update | Most recent date on which the dataset was changed, updated or modified. | modified Publisher | The publishing agency. | publisher -Contact Name | Contact person's name for the asset. | person +Contact Name | Contact person's name for the asset. | contactPoint Contact Email | Contact person's email address. | mbox Unique Identifier | A unique identifier for the dataset or API as maintained within an Agency catalog or database. | identifier Public Access Level | The degree to which this dataset **could** be made publicly-available, *regardless of whether it has been made available*. Choices: public (is or *could be* made publicly available), restricted public (available under certain conditions), or non-public (never able to be made publicly available) | accessLevel @@ -137,6 +137,14 @@ Further Metadata Field Guidance **Usage Notes** | Represent each bureau responsible for the dataset according to the codes found in [OMB Circular A-11, Appendix C](http://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/app_c.pdf). Start with the agency code, then a colon, then the bureau code. **Example** | The Office of the Solicitor (86) at the Department of the Interior (010) would be: `{"bureauCode":["010:86"]}` +{: .table .table-striped} +**Field** | **contactPoint** +----- | ----- +**Cardinality** | (1,1) +**Required** | Yes, always +**Accepted Values** | String +**Usage Notes** | Name should be formatted as Last, First +**Example** | `{"contactPoint":"Brown, John"}` {: .table .table-striped} **Field** | **dataDictionary** @@ -269,15 +277,6 @@ Further Metadata Field Guidance **Usage Notes** | Dates should be [ISO 8601](http://www.w3.org/TR/NOTE-datetime) of least resolution. In other words, as much of YYYY-MM-DDThh:mm:ss.sTZD as is relevant to this dataset. If this file is brand-new, enter the **issued** date here as well. **Example** | `{"modified":"2012-01-15"}` -{: .table .table-striped} -**Field** | **person** ------ | ----- -**Cardinality** | (1,1) -**Required** | Yes, always -**Accepted Values** | String -**Usage Notes** | Name should be formatted as Last, First -**Example** | `{"person":"Brown, John"}` - {: .table .table-striped} **Field** | **PrimaryITInvestmentUII** ----- | ----- From ff83878ae803a92d0d23e6393d343bcef9fe3cff Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Sun, 25 Aug 2013 22:02:14 -0400 Subject: [PATCH 21/23] Reverting keyword field to its original singular In response to discussion in #44 --- schema.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/schema.md b/schema.md index 3cd14e83..f396a9a0 100644 --- a/schema.md +++ b/schema.md @@ -47,7 +47,7 @@ Field | Definition ------- | --------------- | -------------- Title | Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery. | title Description | Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. | description -Tags | Tags (or keywords) help users discover your dataset, please include terms that would be used by technical and non-technical users. | keywords +Tags | Tags (or keywords) help users discover your dataset, please include terms that would be used by technical and non-technical users. | keyword Bureau Code | Combined agency and bureau code from [OMB A-11, Appendix C](http://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/app_c.pdf) | bureauCode Last Update | Most recent date on which the dataset was changed, updated or modified. | modified Publisher | The publishing agency. | publisher @@ -224,13 +224,13 @@ Further Metadata Field Guidance **Example** | `{"issued":"2001-01-15"}` {: .table .table-striped} -**Field** | **keywords** +**Field** | **keyword** ----- | ----- **Cardinality** | (1,n) **Required** | Yes, always **Accepted Values** | Array of strings **Usage Notes** | Surround each keyword with quotes. Separate keywords with commas. -**Example** | `{"keywords": ["squash","vegetables","veggies","greens","leafy","spinach","kale","nutrition","tomatoes","tomatos"]}` +**Example** | `{"keyword": ["squash","vegetables","veggies","greens","leafy","spinach","kale","nutrition","tomatoes","tomatos"]}` {: .table .table-striped} **Field** | **landingPage** From dfbe8cef3c895f3cdcc31ab52beab143ad0144b0 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Mon, 9 Sep 2013 19:21:07 -0400 Subject: [PATCH 22/23] Small verbiage changes, moved programOffice to required --- schema.md | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/schema.md b/schema.md index f396a9a0..0e795ff5 100644 --- a/schema.md +++ b/schema.md @@ -47,14 +47,15 @@ Field | Definition ------- | --------------- | -------------- Title | Human-readable name of the asset. Should be in plain English and include sufficient detail to facilitate search and discovery. | title Description | Human-readable description (e.g., an abstract) with sufficient detail to enable a user to quickly understand whether the asset is of interest. | description -Tags | Tags (or keywords) help users discover your dataset, please include terms that would be used by technical and non-technical users. | keyword -Bureau Code | Combined agency and bureau code from [OMB A-11, Appendix C](http://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/app_c.pdf) | bureauCode +Tags | Tags (or keywords) help users discover your dataset; please include terms that would be used by technical and non-technical users. | keyword +Bureau Code | Combined agency and bureau code from [OMB Circular A-11, Appendix C](http://www.whitehouse.gov/sites/default/files/omb/assets/a11_current_year/app_c.pdf) in the format of “015:010”. | bureauCode Last Update | Most recent date on which the dataset was changed, updated or modified. | modified Publisher | The publishing agency. | publisher Contact Name | Contact person's name for the asset. | contactPoint Contact Email | Contact person's email address. | mbox Unique Identifier | A unique identifier for the dataset or API as maintained within an Agency catalog or database. | identifier -Public Access Level | The degree to which this dataset **could** be made publicly-available, *regardless of whether it has been made available*. Choices: public (is or *could be* made publicly available), restricted public (available under certain conditions), or non-public (never able to be made publicly available) | accessLevel +Program Office | Primary program related to this data asset, from the Federal Program Inventory on Performance.gov. | programOffice +Public Access Level | The degree to which this dataset **could** be made publicly-available, *regardless of whether it has been made available*. Choices: public (Data asset is or could be made publicly available to all without restrictions), restricted public (Data asset is available under certain use restrictions), or non-public (Data asset is not available to members of the public) | accessLevel "Common Core" Required-if-Applicable Fields @@ -64,12 +65,11 @@ The following fields must be used to describe each dataset if they are applicabl {: .table .table-striped} Field | Definition |JSON ------- | --------------- | -------------- -Access Level Comment | Explanation of how to access a restricted public dataset, or why a private dataset cannot be released. | accessLevelComment +Access Level Comment | An explanation for the selected **accessLevel** including instructions forof how to access a restricted file, if applicable, or explanation for why “non-public” or “restricted public” data assets is not “public,” if applicable. | accessLevelComment Download URL | URL providing direct access to the downloadable distribution of a dataset. | accessURL Endpoint | Endpoint of web service to access dataset. | webService Format | The file format or API type of the distribution. | format License | The license dataset or API is published with. See [Open Licenses](/open-licenses/) for more information. | license -Program Office | The program office responsible for the dataset | programOffice Spatial | The range of spatial applicability of a dataset. Could include a spatial region like a bounding box or a named place. | spatial Temporal | The range of temporal applicability of a dataset (i.e., a start and end date of applicability for the data). | temporal @@ -110,6 +110,15 @@ Further Metadata Field Guidance **Usage Notes** | This field refers to degree to which this dataset *could be made available* to the public, regardless of whether it is currently available to the public. For example, if a member of the public can walk into your agency and obtain a dataset, that entry is **public** even if there are no files online. A *restricted public* dataset is one only available under certain conditions or to certain audiences (such as researchers who sign a waiver). A *non-public* dataset is one that could never be made available to the public for privacy, security, or other reasons as determined by your agency. **Example** | `{"accessLevel":"public"}` +{: .table .table-striped} +**Field** | **accessLevelComment** +----- | ----- +**Cardinality** | (0,1) +**Required** | Yes, if accessLevel is "restricted public" or "non-public" +**Accepted Values** | String +**Usage Notes** | An explanation for the selected “accessLevel” including instructions forof how to access a restricted file, if applicable, or explanation for why “non-public” or “restricted public” data assets is not “public,” if applicable. +**Example** | `{"accessLevelComment":"This dataset contains Personally Identifiable Information and could not be released for public access. A statistical analysis of the data contained herein, stripped of all personal identifiers, is available at http://another.website.gov/dataset."}` + {: .table .table-striped} **Field** | **accessURL** ----- | ----- @@ -197,6 +206,7 @@ Further Metadata Field Guidance ] {: .table .table-striped} + **Field** | **format** ----- | ----- **Cardinality** | (0,1) @@ -290,9 +300,9 @@ Further Metadata Field Guidance **Field** | **programOffice** ----- | ----- **Cardinality** | (0,n) -**Required** | Yes, if a program office owns or co-owns this dataset. +**Required** | Yes, always **Accepted Values** | Array of strings -**Usage Notes** | Enter the name of the program office responsible for the dataset, as found in the [Federal Program Inventory](http://goals.performance.gov/federalprograminventory). +**Usage Notes** | Enter the name of the program office responsible for the dataset, as found in the [Federal Program Inventory](http://goals.performance.gov/federalprograminventory). Enter "None" if not assigned to a program office. **Example** | `{"programOffice":["2.31. Survivors’ and Dependents’ Educational Assistance"]}` {: .table .table-striped} From e09f2fd354f990528c6dcb4f85ba33046f8dac09 Mon Sep 17 00:00:00 2001 From: Marina Martin Date: Mon, 9 Sep 2013 20:53:50 -0400 Subject: [PATCH 23/23] Changed contactPoint from Last, First to free text. --- schema.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/schema.md b/schema.md index 0e795ff5..e3427c6e 100644 --- a/schema.md +++ b/schema.md @@ -152,8 +152,8 @@ Further Metadata Field Guidance **Cardinality** | (1,1) **Required** | Yes, always **Accepted Values** | String -**Usage Notes** | Name should be formatted as Last, First -**Example** | `{"contactPoint":"Brown, John"}` +**Usage Notes** | +**Example** | `{"contactPoint":"John Brown"}` {: .table .table-striped} **Field** | **dataDictionary**