diff --git a/SPEC.md b/SPEC.md index db22851..291ddb2 100644 --- a/SPEC.md +++ b/SPEC.md @@ -19,18 +19,18 @@ We welcome your participation and appreciate your patience as we finalize the pl - [Input](#input) - [Which files to analyze](#which-files-to-analyze) - [Output](#output) - - [Data types](#data-types) - - [Issues](#issues) - - [Descriptions](#descriptions) - - [Categories](#categories) - - [Remediation points](#remediation-points) - - [Locations](#locations) - - [Positions](#positions) - - [Contents](#contents) - - [Source code traces](#source-code-traces) - [Resource restrictions](#resource-restrictions) - [Security restrictions](#security-restrictions) - [Engine specification file](#engine-specification-file) +- [Data types](#data-types) + - [Issues](#issues) + - [Descriptions](#descriptions) + - [Categories](#categories) + - [Remediation points](#remediation-points) + - [Locations](#locations) + - [Positions](#positions) + - [Contents](#contents) + - [Source code traces](#source-code-traces) - [Packaging](#packaging) - [Naming convention](#naming-convention) @@ -72,7 +72,7 @@ The `include_paths` key will always be present in `config.json`, and must be use ### Output -* Engines must stream Issues to `STDOUT` in JSON format. +* Engines must stream [Issues](#issues) to `STDOUT` in JSON format. * When possible, results should be emitted as soon as they are computed (streamed, not buffered). * Each issue must be terminated by the [null character][null] (`\0` in most programming languages), but can additionally be separated by newlines. * Unstructured information can be printed on `STDERR` for the purposes of aiding debugging. @@ -81,9 +81,65 @@ The `include_paths` key will always be present in `config.json`, and must be use * *Note that an engine finding and emitting issues is expected, and not a fatal error - this means that if your engine finds issues, it should still in all cases exit with code zero.* * *Note that all results will be discard and the analysis failed if an engine exits with a non-zero exit code.* -### Data Types -#### Issues + +### Resource restrictions + +In order to ensure analysis runs reliably across a variety of systems, Engines +must conform to some basic resource restrictions: + +* The Docker image for an Engine must not exceed 512MB, including all layers. +* The combined total RSS memory usage by all processes within the Docker container must not exceed 1GB at any time. +* All Engines must complete and exit within 10 minutes. + +### Security restrictions + +Engines run in a secured runtime environment, within container-based virtualization +provided by Docker. + +* The `/code` directory, containing the files to analyze, & `/config.json`, containing configuration for the engine to use, are mounted read-only. +* Engines run with no network access (`--net=none` in Docker). They must not rely on making any external network calls. +* Engines run with the minimal set of Linux capabilities (`--cap-drop all` in Docker) +* Engines are always run as a user `app` with UID and GID of 9000, and never `root`. + +## Engine specification file + +All engines must include an `engine.json` file at `/engine.json`. This file includes information that is necessary for the analysis runtime and metadata about the engine. Here is an example specification: + +```json +{ + "name": "govet", + "description": "govet was created by the Go team at Google, and examines Go source code and reports suspicious constructs, and potential bugs.", + "maintainer": { + "name": "Michael R. Bernstein", + "email": "mrb@codeclimate.com" + }, + "languages" : ["Go"], + "version": "da5a2077", + "spec_version": "0.0.1" +} +``` + +The following fields are declared in the specification file, and all are required: + +* `name` (`String`) - the name of the package +* `description` (`String`) - a description of the engine +* `maintainer` (`Object`) - data about the engine maintainer + * `name` (`String`) - the name of the maintainer + * `email` (`String`) - the email address of the maintainer +* `languages` (`[String]`) - an array of programming languages that this engine is meant to analyze. **See note about possible values for `languages` below** +* `version` (`String`) - engine version, an arbitrary string maintained by the engine maintainer +* `spec_version` (`String`) - the version of the specification which this engine supports + +The `languages` key can have the following values: +- `*` - all possible languages, for language agnostic analysis engines +- Any language listed as keys in the `github/linguist` repository's data file, which [can be found here](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml). +- Note that we follow these spellings exactly, so while [`JavaScript` is a valid spelling of that language](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml#L1642), `javascript` is not. +- Some commonly used languages spelled properly are: `CSS, Clojure, CoffeeScript, Go, Haskell, Java, JavaScript, PHP, Python, Ruby, SCSS, Scala, Shell` + +## Data Types + +### Issues An `issue` represents a single instance of a real or potential code problem, detected by a static analysis Engine. @@ -113,13 +169,13 @@ An `issue` represents a single instance of a real or potential code problem, det * `severity` -- **Optional**. A `Severity` string (`info`, `minor`, `major`, `critical`, or `blocker`) describing the potential impact of the issue found. * `fingerprint` -- **Optional**. A unique, deterministic identifier for the specific issue being reported to allow a user to exclude it from future analyses. -##### Descriptions +#### Descriptions Descriptions must be a single line of text (no newlines), with no HTML formatting contained within. Ideally, descriptions should be fewer than 70 characters long, but this is not a requirement. Descriptions support one type of basic Markdown formatting, which is the use of backticks to produce inline <code> tags that are rendered in a fixed width font. Identifiers like class, method and variable names should be wrapped within backticks whenever possible for optimal rendering by tools that consume Engines data. -##### Categories +#### Categories Issues must be associated with one or more categories. Valid issue `categories` are: @@ -132,7 +188,7 @@ Issues must be associated with one or more categories. Valid issue `categories` - `Security` -- TODO describe me - `Style` -- TODO describe me -##### Remediation points +#### Remediation points Remediation points are an abstract, relative scale to express the estimated time it would take for a developer to resolve an issue. They are abstract because they do not map directly to absolute time durations like minutes and hours. Providing remediation points is optional, but they can be useful to certain tools that consume Engines data and generate reports related to the level of effort required to improve a codebase (like CodeClimate.com). @@ -143,7 +199,7 @@ Here are some guidelines to compute appropriate remediation points values for an The baseline remediation points value is 50,000, which is the time it takes to fix a trivial code style issue like a missing semicolon on a single line, including the time for the developer to open the code, make the change, and confidently commit the fix. All other remediation points values are expressed in multiples of that Basic Remediation Point Value. -#### Locations +### Locations Locations refer to ranges of a source code file. A Location contains a `path`, a source range, (expressed as `lines` or `positions`), and an optional array of `other_locations`. Here's an example location: @@ -180,7 +236,7 @@ Locations of the first form (_line-based_ locations) emit a beginning and end li Locations in the second form (_position-based_ locations) allow more precision by including references to the specific characters that form the source code range representing the issue. -##### Positions +#### Positions Positions refer to specific characters within a source file, and can be expressed in two ways: @@ -210,7 +266,7 @@ line of the file. Offsets, however are 0-based. A Position of `{ "offset": 4 }` represents the _fifth_ character in the file. Importantly, the `offset` is from the beginning of the file, not the beginning of a line. Newline characters (and all characters) count when computing an offset. -#### Contents +### Contents Content gives more information about the issue's check, including a description of the issue, how to fix it, and relevant links. They are expressed as a hash with a `body` key. The value of this key should be a [Markdown](http://daringfireball.net/projects/markdown/) document. For example: @@ -219,7 +275,7 @@ Content gives more information about the issue's check, including a description "body": "This cop checks that the ABC size of methods is not higher than the configured maximum. The ABC size is based on assignments, branches (method calls), and conditions. See [this page](http://c2.com/cgi/wiki?AbcMetric) for more information on ABC size." } ``` -#### Source code traces +### Source code traces Some engines require the ability to refer to other source locations in describing an issue. For this reason, an `Issue` object can have an associated `Trace`, a data structure meant to represent ordered or unordered lists of source code locations. A `Trace` has the following fields: @@ -249,59 +305,6 @@ An example trace: ``` -### Resource restrictions - -In order to ensure analysis runs reliably across a variety of systems, Engines -must conform to some basic resource restrictions: - -* The Docker image for an Engine must not exceed 512MB, including all layers. -* The combined total RSS memory usage by all processes within the Docker container must not exceed 1GB at any time. -* All Engines must complete and exit within 10 minutes. - -### Security restrictions - -Engines run in a secured runtime environment, within container-based virtualization -provided by Docker. - -* The `/code` directory, containing the files to analyze, & `/config.json`, containing configuration for the engine to use, are mounted read-only. -* Engines run with no network access (`--net=none` in Docker). They must not rely on making any external network calls. -* Engines run with the minimal set of Linux capabilities (`--cap-drop all` in Docker) -* Engines are always run as a user `app` with UID and GID of 9000, and never `root`. - -## Engine specification file - -All engines must include an `engine.json` file at `/engine.json`. This file includes information that is necessary for the analysis runtime and metadata about the engine. Here is an example specification: - -```json -{ - "name": "govet", - "description": "govet was created by the Go team at Google, and examines Go source code and reports suspicious constructs, and potential bugs.", - "maintainer": { - "name": "Michael R. Bernstein", - "email": "mrb@codeclimate.com" - }, - "languages" : ["Go"], - "version": "da5a2077", - "spec_version": "0.0.1" -} -``` - -The following fields are declared the specification file, and all are required: - -* `name` (`String`) - the name of the package -* `description` (`String`) - a description of the engine -* `maintainer` (`Object`) - data about the engine maintainer - * `name` (`String`) - the name of the maintainer - * `email` (`String`) - the email address of the maintainer -* `languages` (`[String]`) - an array of programming languages that this engine is meant to analyze. **See note about possible values for `languages` below** -* `version` (`String`) - engine version, an arbitrary string maintained by the engine maintainer -* `spec_version` (`String`) - the version of the specification which this engine supports - -The `languages` key can have the following values: -- `*` - all possible languages, for language agnostic analysis engines -- Any language listed as keys in the `github/linguist` repository's data file, which [can be found here](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml). -- Note that we follow these spellings exactly, so while [`JavaScript` is a valid spelling of that language](https://github.com/github/linguist/blob/master/lib/linguist/languages.yml#L1642), `javascript` is not. -- Some commonly used languages spelled properly are: `CSS, Clojure, CoffeeScript, Go, Haskell, Java, JavaScript, PHP, Python, Ruby, SCSS, Scala, Shell` ## Packaging @@ -345,3 +348,4 @@ Your `Docker` image must be built with the name `codeclimate/codeclimate-YOURENG [null]: http://en.wikipedia.org/wiki/Null_character +