From 82fce65d8ad00083777278c94372f378b499f864 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 10:33:09 -0500 Subject: [PATCH 01/25] Add files via upload readme changed --- 01_README.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 01_README.md diff --git a/01_README.md b/01_README.md new file mode 100644 index 0000000..63c6e90 --- /dev/null +++ b/01_README.md @@ -0,0 +1,31 @@ +[[https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png]] + +#πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ + + +Singer is an open source ETL tool. In case you're unfamiliar ETL stands for Extract, Transform, and Load. It's a term used in the data warehousing world. If you happen to be moving data or needing to pull data we want to help! + +###Singer sets a standard for moving data between databases, web APIs, files, queues, or just about anything else you can think of. _(Except penguins 🐧 and candy 🍬. We haven't figured out how to move those yet unfortunately.)_ + +In this documentation we'll take you through a number of scenarios. + +- 🍺 If you'd like to pull or extract data check out [TAPS](02_EXTRACT_WITH_TAPS.md) +- 🎯 If you'd like to send or load data check out [TARGETS](03_LOAD_WITH_TARGETS.md) +- πŸ“ If you want to dive in some technical goodness check out our [SPECS](07_SPEC.md) +- πŸ˜Žβœ… Once you've created your own tap or target be sure to let us know and join our kool kids [UNOFFICIAL](04_COOL_UNOFFICIAL_CLUB.md) club or learn how to submit to be part of the super cool [OFFICIAL](05_MAKE_IT_OFFICIAL.md) integrations. +- πŸ’― Check this out to learn more about [BEST PRACTICES](06_BEST_PRACTICES.md) +- 🀝 And above all please respect our [CODE OF CONDUCT](09_CODE_OF_CONDUCT.md) + + +###Communication +If you're feeling social we'd love to chat. Pick your poison(s): +- [Slack](https://singer-slackin.herokuapp.com/) +- [Twitter](https://twitter.com/singer_io) +- [Our Public Roadmap on Trello](https://trello.com/b/BMNRnIoU/singer-roadmap) +- Feel free to create an issue on any repo's for specific questions +- [Carrier pigeon](beta) + + +--- + +Copyright © 2017 Stitch From 2d2d3be0a67397598837e2c3fa88438cf87d1cd5 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 10:34:38 -0500 Subject: [PATCH 02/25] Update 01_README.md --- 01_README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_README.md b/01_README.md index 63c6e90..99905dd 100644 --- a/01_README.md +++ b/01_README.md @@ -1,4 +1,4 @@ -[[https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png]] +![Singer Logo](https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) #πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ From eb35ab5ca1a9f265ee4e8b8e7e6cc9697b7a7fdd Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 10:35:05 -0500 Subject: [PATCH 03/25] Update 01_README.md --- 01_README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/01_README.md b/01_README.md index 99905dd..7e476e4 100644 --- a/01_README.md +++ b/01_README.md @@ -1,11 +1,11 @@ ![Singer Logo](https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) -#πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ +# πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ Singer is an open source ETL tool. In case you're unfamiliar ETL stands for Extract, Transform, and Load. It's a term used in the data warehousing world. If you happen to be moving data or needing to pull data we want to help! -###Singer sets a standard for moving data between databases, web APIs, files, queues, or just about anything else you can think of. _(Except penguins 🐧 and candy 🍬. We haven't figured out how to move those yet unfortunately.)_ +### Singer sets a standard for moving data between databases, web APIs, files, queues, or just about anything else you can think of. _(Except penguins 🐧 and candy 🍬. We haven't figured out how to move those yet unfortunately.)_ In this documentation we'll take you through a number of scenarios. @@ -17,7 +17,7 @@ In this documentation we'll take you through a number of scenarios. - 🀝 And above all please respect our [CODE OF CONDUCT](09_CODE_OF_CONDUCT.md) -###Communication +### Communication If you're feeling social we'd love to chat. Pick your poison(s): - [Slack](https://singer-slackin.herokuapp.com/) - [Twitter](https://twitter.com/singer_io) From 54c99d4802469672739abd629886cdf02cf6bfe7 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 10:35:52 -0500 Subject: [PATCH 04/25] Update 01_README.md --- 01_README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_README.md b/01_README.md index 7e476e4..8d1f765 100644 --- a/01_README.md +++ b/01_README.md @@ -23,7 +23,7 @@ If you're feeling social we'd love to chat. Pick your poison(s): - [Twitter](https://twitter.com/singer_io) - [Our Public Roadmap on Trello](https://trello.com/b/BMNRnIoU/singer-roadmap) - Feel free to create an issue on any repo's for specific questions -- [Carrier pigeon](beta) +- Carrier pigeon (beta) --- From 096e299934bacb7ad1e1f4561d065ea89ba264a3 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:15:44 -0400 Subject: [PATCH 05/25] Add files via upload first draft of changes --- 01_README.md | 8 +- 02_EXTRACT_WITH_TAPS.md | 107 ++++++++++ 03_SEND_TO_TARGETS.md | 31 +++ 04_COOL_TAPS_CLUB.md | 64 ++++++ 05_MAKE_IT_OFFICIAL.md | 18 ++ 06_BEST_PRACTICES.md | 424 ++++++++++++++++++++++++++++++++++++++++ 07_SPEC.md | 259 ++++++++++++++++++++++++ 08_PROPOSALS.md | 96 +++++++++ 8 files changed, 1003 insertions(+), 4 deletions(-) create mode 100644 02_EXTRACT_WITH_TAPS.md create mode 100644 03_SEND_TO_TARGETS.md create mode 100644 04_COOL_TAPS_CLUB.md create mode 100644 05_MAKE_IT_OFFICIAL.md create mode 100644 06_BEST_PRACTICES.md create mode 100644 07_SPEC.md create mode 100644 08_PROPOSALS.md diff --git a/01_README.md b/01_README.md index 8d1f765..ce44a7b 100644 --- a/01_README.md +++ b/01_README.md @@ -1,9 +1,9 @@ -![Singer Logo](https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) +[[https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png]] -# πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ +#πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ -Singer is an open source ETL tool. In case you're unfamiliar ETL stands for Extract, Transform, and Load. It's a term used in the data warehousing world. If you happen to be moving data or needing to pull data we want to help! +## Singer is an open source ETL tool. In case you're unfamiliar ETL stands for Extract, Transform, and Load. It's a term used in the data warehousing world. If you happen to be moving data or needing to pull data we want to help! ### Singer sets a standard for moving data between databases, web APIs, files, queues, or just about anything else you can think of. _(Except penguins 🐧 and candy 🍬. We haven't figured out how to move those yet unfortunately.)_ @@ -23,7 +23,7 @@ If you're feeling social we'd love to chat. Pick your poison(s): - [Twitter](https://twitter.com/singer_io) - [Our Public Roadmap on Trello](https://trello.com/b/BMNRnIoU/singer-roadmap) - Feel free to create an issue on any repo's for specific questions -- Carrier pigeon (beta) +- 🐦 Carrier pigeon (beta) --- diff --git a/02_EXTRACT_WITH_TAPS.md b/02_EXTRACT_WITH_TAPS.md new file mode 100644 index 0000000..108f7b0 --- /dev/null +++ b/02_EXTRACT_WITH_TAPS.md @@ -0,0 +1,107 @@ +# 🍺 All about TAPS 🍺 + +## Taps extract data from any source and write that data to a standard stream in a JSON-based format. + +Be Check out our [official](05_MAKE_IT_OFFICIAL.md) and [unofficial](04_COOL_UNOFFICIAL_CLUB.md) pages before creating your own since it might save you some time in the long run. + +### Making Taps + +If a tap for your use case doesn't exist yet have no fear! This documentation will help. Let's get started: + +### πŸ‘©πŸ½β€πŸ’» πŸ‘¨πŸ»β€πŸ’» Hello, world + +A Tap is just a program, written in any language, that outputs data to `stdout` according to the [Singer spec](07_SPEC.md). + +In fact, your first Tap can be written from the command line, without any programming at all: + +```bash +β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' +``` + +This writes the datapoint `{"value":"world"}` to the *hello* stream along with a schema indicating that `value` is a string. + +That data can be piped into any Target, like the [Google Sheets Target], over `stdin`: + +```bash +β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' | target-gsheet -c config.json +``` + +### 🐍🐍🐍 A Python Tap + +To move beyond *Hello, world* you'll need a real programming language. Although any language will do, we have built a Python library to help you get up and running quickly. This is because Python is the defacto standard for data engineers or folks interested in moving data like yourself. + +If you need help ramping up or getting started with Python there's fantastic community support [here](https://www.python.org/about/gettingstarted/). + +Let's write a Tap called `tap_ip.py` that retrieves the current IP using icanhazip.com, and writes that data with a timestamp. + +First, install the Singer helper library with `pip`: + +```bash +β€Ί pip install singer-python +``` + +Then, open up a new file called `tap_ip.py` in your favorite editor. + +```python +import singer +import urllib.request +from datetime import datetime, timezone +``` + +We'll use the `datetime` module to get the current timestamp, the +`singer` module to write data to `stdout` in the correct format, and +the `urllib.request` module to make a request to icanhazip.com. + +```python +now = datetime.now(timezone.utc).isoformat() +schema = { + 'properties': { + 'ip': {'type': 'string'}, + 'timestamp': {'type': 'string', 'format': 'date-time'}, + }, +} + +``` + +This sets up some of the data we'll need - the current time, and the +schema of the data we'll be writing to the stream formatted as a [JSON +Schema]. + +```python +with urllib.request.urlopen('http://icanhazip.com') as response: + ip = response.read().decode('utf-8').strip() + singer.write_schema('my_ip', schema, 'timestamp') + singer.write_records('my_ip', [{'timestamp': now, 'ip': ip}]) +``` + +Finally, we make the HTTP request, parse the response, and then make +two calls to the `singer` library: + + - `singer.write_schema` which writes the schema of the `my_ip` stream and defines its primary key + - `singer.write_records` to write a record to that stream + +We can send this data to Google Sheets as an example by running our new Tap +with the Google Sheets Target: + +```bash +β€Ί python tap_ip.py | target-gsheet -c config.json +``` + +Alternatively you could send it to a csv just as easy by doing this: + +```bash +β€Ί python tap_ip.py | target-csv -c config.json +``` + +To summarize the formula for pulling with a tap and sending to a target is: + +```bash +β€Ί python YOUR_TAP_FILE.py -c TAP_CONFIG_FILE_HERE.json | TARGET-TYPE -c TARGET_CONFIG_FILE_HERE.json +``` + +You might not always need config files, in which case it would just be: + +```bash +β€Ί python YOUR_TAP_FILE.py | TARGET-TYPE +``` + diff --git a/03_SEND_TO_TARGETS.md b/03_SEND_TO_TARGETS.md new file mode 100644 index 0000000..4fd9a3b --- /dev/null +++ b/03_SEND_TO_TARGETS.md @@ -0,0 +1,31 @@ +# 🎯 All about TARGETS 🎯 + +## Targets are very similar to TAPS in that they still adhere to the JSON Schema standard. + +Right now there are targets made to send to a csv, Google Sheets, Magento, or Stitch but the possibilities are endless. To send your tap to a target here is an example using Google Sheets: + + +```bash +β€Ί python tap_ip.py | target-gsheet -c config.json +``` + +Alternatively you could send it to a csv just as easy by doing this: + +```bash +β€Ί python tap_ip.py | target-csv -c config.json +``` + +To summarize the formula for pulling with a tap and sending to a target is: + +```bash +β€Ί python YOUR_TAP_FILE.py -c TAP_CONFIG_FILE_HERE.json | TARGET-TYPE -c TARGET_CONFIG_FILE_HERE.json +``` + +You might not always need config files, in which case it would just be: + +```bash +β€Ί python YOUR_TAP_FILE.py | TARGET-TYPE +``` +See? Easy. + +## If you'd like to create your own TARGET just follow the same tutorial for making a tap diff --git a/04_COOL_TAPS_CLUB.md b/04_COOL_TAPS_CLUB.md new file mode 100644 index 0000000..e7cdf1f --- /dev/null +++ b/04_COOL_TAPS_CLUB.md @@ -0,0 +1,64 @@ +## THE OFFICIAL (AND UNOFFICIAL) COLLECTION OF TAPS + +### Defining Official and Unofficial + +Singer supports official Taps through [Stitch](https://stitchdata.com). Stitch is software that moves all your data into your datawarehouse and keeps everything nice and tidy. Official taps are maintained and integrated by the team at Stitch. This is why not all unofficial taps become official. Sometimes the team is building something else and other times the integration isn't high on the roadmap priorities. + +Regardless this doesn't mean that you can't move all the data you want. Which is why unofficial taps are so important and why we value them so much! If you've created a tap or target be sure to simply submit a pull request to our unofficial table and show the world your work. Also be sure to drop us a line so we can send you *EPIC SWAG* 🎁 + +Without further ado here are the unofficial taps: + +| TYPE | NAME + REPO | USER πŸ‘¨πŸ½β€πŸ’» πŸ‘©πŸ»β€πŸ’» πŸ‘‘ | +| -------- |----------------------------------------------------------------------------|------------------------------------------------------------------| +| tap | [tap-csv](https://github.com/robertjmoore/tap-csv robertjmoore) | [robertjmoore](https://github.com/robertjmoore/) | +| tap | [tap-clubhouse](https://github.com/envoy/tap-clubhouse) | [envoy](https://github.com/envoy) | +| tap | [tap-outbrain](https://github.com/fishtown-analytics/tap-outbrain) | [fishtown-analytics](https://github.com/fishtown-analytics) | +| tap | [tap-shippo](https://github.com/robertjmoore/tap-shippo) | [robertjmoore](https://github.com/robertjmoore/) | +| tap | [singer-airtable](https://github.com/StantonVentures/singer-airtable) | [StantonVentures](https://github.com/stantonventures) | +| tap | [tap-harvest](https://github.com/FacetInteractive/tap-harvest) | [FacetInteractive](https://github.com/facetinteractive) | +| tap | [tap-s3-csv](https://github.com/fishtown-analytics/tap-s3-csv) | [fishtown-analytics](https://github.com/fishtown-analytics) | +| tap | [tap-jsonfeed](https://github.com/briansloane/tap-jsonfeed) | [briansloane](https://github.com/briansloane/) | +| tap | [tap-taboola](https://github.com/fishtown-analytics/tap-taboola) | [fishtown-analytics](https://github.com/fishtown-analytics) | +| tap | [tap-reviewscouk](https://github.com/onedox/tap-reviewscouk) | [ondex](https://github.com/onedox) | +| tap | [tap-fake-users](https://github.com/bengarvey/tap-fake-users) | [bengarvey](https://github.com/bengarvey) | +| tap | [tap-awin](https://github.com/onedox/tap-awin) | [onedox](https://github.com/onedox) | +| tap | [marvel-tap](https://github.com/ashhath/marvel-tap) | [ashhath](https://github.com/ashhath) | +| tap | [tap-mixpanel](https://github.com/Kierchon/tap-mixpanel) | [kierchon](https://github.com/kierchon) | +| target | [target-magentobi](https://github.com/robertjmoore/target-magentobi) | [robertjmoore](https://github.com/robertjmoore/) | +| tap | [tap-appsflyer](https://github.com/ezcater/tap-appsflyer) | [ezcater](https://github.com/ezcater) | +| tap | [tap-fullstory](https://github.com/expectedbehavior/tap-fullstory) | [expectedbehavior](https://github.com/expectedbehavior) | +| tap | [stitch-stream-deputy](https://github.com/DeputyApp/stitch-stream-deputy) | [deputyapp](https://github.com/deputyapp) | + +And then just in case here's a tidy list of the official ones supported by Stitch: + +| TYPE | NAME + REPO | +| -------- |-----------------------------------------------------------------------------| +| tap | [Hubspot](https://github.com/singer-io/tap-hubspot) | +| tap | [Marketo](https://github.com/singer-io/tap-marketo) | +| tap | [Shippo](https://github.com/singer-io/tap-shippo) | +| tap | [GitHub](https://github.com/singer-io/tap-github) | +| tap | [Close.io](https://github.com/singer-io/tap-closeio) | +| tap | [Referral SaaSquatch](https://github.com/singer-io/tap-referral-saasquatch) | +| tap | [Freshdesk](https://github.com/singer-io/tap-freshdesk) | +| tap | [Braintree](https://github.com/singer-io/tap-braintree) | +| tap | [GitLab](https://github.com/singer-io/tap-gitlab) | +| tap | [Wootric](https://github.com/singer-io/tap-wootric) | +| tap | [Fixer.io](https://github.com/singer-io/tap-fixerio) | +| tap | [Outbrain](https://github.com/singer-io/tap-outbrain) | +| tap | [Harvest](https://github.com/singer-io/tap-harvest) | +| tap | [Taboola](https://github.com/singer-io/tap-taboola) | +| tap | [Urban Airship](https://github.com/envoy/tap-clubhouse) | +| tap | [Facebook](https://github.com/singer-io/tap-facebook) | +| tap | [Google AdWords](https://github.com/singer-io/tap-adwords) | +| target | [Stitch](https://github.com/singer-io/target-stitch) | +| target | [CSV](https://github.com/singer-io/target-csv) | +| target | [Google Sheets](https://github.com/singer-io/target-gsheet) | +| target | [Magento BI](https://github.com/robertjmoore/target-magentobi) | + + + + + + + + diff --git a/05_MAKE_IT_OFFICIAL.md b/05_MAKE_IT_OFFICIAL.md new file mode 100644 index 0000000..faa082f --- /dev/null +++ b/05_MAKE_IT_OFFICIAL.md @@ -0,0 +1,18 @@ +# BECOME OFFICIALLY COOL + +So you've built a tap or a target have you? We think that's pretty groovy. To submit a tap for integration with Stitch an become official we ask that they follow a set standard. If you're interested in submitting to be an official tap we're mighty obliged and created a checklist so you can increase your chances of integration. + +### Check out the [BEST PRACTICES](06_BEST_PRACTICES.md) doc which will have all the instructions and way more in depth details of the following: +- [ ] Your work has a `start_date` field in the config +- [ ] Your work accepts a `user_agent` field in the config +- [ ] Your work respects API rate limits +- [ ] Your work doesn't impose memory constraints +- [ ] Your dates are all in RFC3339 format +- [ ] All states are in date format +- [ ] All data is streamed in ascending order if possible +- [ ] Your work doesn't contain any sensitive info like API keys, client work, etc. +- [ ] Please keep your schemas stored in a schema folder +- [ ] You've tested your work +- [ ] Please run pylint on your work +- [ ] Your work shows metrics +- [ ] Message [@BrianSloan](brian@stitchdata.com) or [@Ash_Hathaway](ashley@stitchdata.com) or reach out to them on [Slack](https://singer-slackin.herokuapp.com/) and let them know you'd like some swag, please. diff --git a/06_BEST_PRACTICES.md b/06_BEST_PRACTICES.md new file mode 100644 index 0000000..12e4f5c --- /dev/null +++ b/06_BEST_PRACTICES.md @@ -0,0 +1,424 @@ +# BEST PRACTICES + +Language +-------- + +Currently all Singer Taps are written in Python. This Best Practices guide +contains some python-specific recommendations. + +Recommended Config Fields +------------------------- + +The Singer Specification does not define any standard fields in the +configuration. We recommend supporting the following fields: + +* `start_date` - A tap should take a `start_date` field in the config that + Indicates how far back an integration should sync data in the absence of + a state entry. + +* `user_agent` - A tap should accept a `user_agent` field in the config + and pass it along in request headers to the API. The caller of a Tap + should include an email address in the `user_agent` field to allow the + API provider to contact you if there's an issue with the tap or your + usage of their API. + +Rate Limiting +------------- + +Most APIs enforce rate limits. Taps should be written in a way that respects these rate limits. + +Rate limits can take the form of quotas (X requests per day), in which case the tap should leave +room for other use of the API, or short-term limits (X requests per Y seconds). For short-term +limits, the entire quota can be used up and the tap should sleep when rate limited. + +The singer-python library's utils namespace has a rate limiting decorator for use. + +Memory Constraints +------------------ + +Taps should not rely on large volumes of RAM being available during run and should strive to keep +as little data in memory as required. Use iterators/generators when available. + + +Dates +----- + +All dates should use the RFC3339 format (which includes timezone offsets). UTC is the preferred +timezone for all data and should be used when possible. + +Good: + - 2017-01-01T00:00:00Z (January 1, 2017 12:00:00AM UTC) + - 2017-01-01T00:00:00-05:00 (January 1, 2017, 12:00:00AM America/New_York) + +Bad: + - 2017-01-01 00:00:00 + +State +----- + +States that are datetimes will all be in RFC3339 format. Using ids or other identifiers for state +is possible but could cause issues if the entities are not immutable and change out of order. If +the API supports filtering by created timestamps but is not immutable (meaning a record can be +updated after creation), DO NOT use the created timestamp for saving state. States using created +timestamps or ids are a sure way to have data discrepancies. + +Write state as early and often as possible, but no sooner and no more often than is required. When a +state is written, no data prior to that state will be synced, so do not update the state until all +possible exceptions will be thrown. + +Endpoints that do not support filtering by updated timestamps or include updated timestamps do not +support saving state. + +If the API supports filtering by updated timestamp, use that for filtering. If the API doesn't +support filtering but does return updated timestamps with the data, filter by the timestamp before +streaming the data. + +When streaming data in, stream the data in ascending order when possible. + +Keep in mind that jobs can be interrupted at any point. States should never be in an invalid +state. Interrupted jobs that save state too early will have data missing. Interrupted jobs that +save state too late will cause an increase in duplicate rows being replicated. + +The Tap's config file can include a `start_date` field indicating the default state. + +### Stream bookmark format + +State records serve to indicate a Tap's progress through a data source, but they can also provide +more granular information about progress through individual streams. + +If a Tap's streams can each have an individual state, the state output by the Tap should conform to +the following format: the state object contains a top-level property `"bookmarks"` that maps to an +object. The bookmarks object contains stream identifiers as property names, each of which maps to an +object describing the respective stream's state. As a concrete example: + +``` +{ + "bookmarks": { + "orders": { + "last_record": "2017-07-07T10:20:00Z" + }, + "customers": { + "last_record": 123 + } + } +} +``` + +In the above example, there are two streams that have been bookmarked: `"orders"` and `"customers"`. +Each of those streams has some replication key field, like an `updated_at` timestamp or a sequential +ID for append-only sources. The replication key value persisted as part of the state message allows +the Tap to: + +- bookmark its progress through the stream +- serve as a query parameter upon subsequent invocations, (e.g. the equivalent of `SELECT * from + customers where id >= 123`) + +The state record above indicates that the Tap last output an entry from the `"orders"` stream where +the source record's replication key field had the value "2017-07-07T10:20:00Z", and that it last +output an entry from the `"customers"` stream where the source record's replication key field had +the value 123. + + +Logging and Exception Handling +------------------------------ + +Log every URL + params that will be requested, but be sure to exclude any sensitive information +such as api keys and client secrets. Log the progress of the sync (e.g. Starting entity 1, +Starting entity 2, etc.) When the API returns with an error, log the status code and body. + +Allow exceptions to be bubbled up and interrupt the tap. DO NOT wrap the tap code in one large +try/except block and log the exception message. The stack trace is much more useful than the error +message. + +If an intermittent error is detected from the API, retry using an exponential backoff (try using +`backoff` library for Python). If an unrecoverable error is detected, exit the script with a +non-zero error code (raise an exception or use `sys.exit(1)`) + + +Module Structure +---------------- + +Source code should be in a module (folder with `__init__.py` file) and not just a script (`module.py` +file). + + +Schemas +------- + +Schemas should be stored in a `schemas` folder under the module directory +as JSON files rather than as native dicts in the source code. + +Always stream the entity's schema before streaming any records for that +entity. + +If the API you're using does not publish a schema, you can use the +`singer-infer-schema` program in [singer-tools] to generate a base schema +to start from. + + +Testing +------- + +Use `singer-tools`'s `singer-check-tap` command to validate that a tap's +output matches the expected format and validate the output against its +schema. + +Code Quality +------------ + +We recommend running pylint on your Tap and making sure that you +understand any issues it reports. You should get your code to the point +where pylint does not report any error-level messages. + +When we import your Tap, we'll run it in CircleCI, and we'll include +Pylint in the test phase. The CircleCI buld will fail if Pylint finds any +issues, so we will need to account for all issues by either fixing them or +disabling the message in Pylint. We are flexible about which Pylint issues +are acceptable, but we generally run Pylint with some of the more pedantic +messages disabled. For example, we typically use the following circle.yml +config: + +```yaml +machine: + python: + version: 3.4.4 + +dependencies: + pre: + - pip install pylint + +test: + post: + - pylint tap_outbrain -d missing-docstring -d logging-format-interpolation -d too-many-locals -d too-many-arguments +``` + +Allowing Users to Select Streams to Sync +---------------------------------------- + +For some data sources, it won't make sense to pull every stream and +property available. For example, suppose we had a Tap for a Postgres +database, and a user only wanted to pull a subset of columns from a +subset of tables. It would be inconvenient if the Tap emitted data +from all tables and columns. + +To address this, we recommend extending the Tap to use a document called a +_catalog_. A _catalog_ is a mechanism for a Tap to indicate what streams +it makes available, and for the user to select certain streams and +properties within those streams. + +A Tap that supports a catalog should provide two additional options: + +* `--discover` - indicates that the Tap should not sync data, but should + just write its catalog to stdout. + +* `--catalog CATALOG` - the Tap should sync data, based on the selections + made in the provided CATALOG file. + +### Catalog Format + +The format of the catalog is as follows. The top level is an object, +with a single key called `"streams"` that points to an array of +objects, each having the following fields: + +| Property | type | required? | Description | +| ----------------- |--------------------|-----------|--------------------------------| +| `tap_stream_id` | string | required | The unique identifier for the stream. This is allowed to be different from the name of the stream in order to allow for sources that have duplicate stream names. | +| `stream` | string | required | The name that will be used for the stream in the data produced by this Tap. | +| `key_properties` | array of strings | optional | List of key properties. | +| `schema` | object | required | The JSON schema for the stream. | +| `replication_key` | string | optional | The name of a property in the source to use as a "bookmark". For example, this will often be an "updated-at" field or an auto-incrementing primary key. | +| `is_view` | boolean | optional | For a database source, indicates that the source is a view. | +| `database_name` | string | optional | For a database source, the name of the database. | +| `table_name` | string | optional | For a database source, the name of the table. | +| `row_count` | integer | optional | The number of rows in the source data, for taps that have access to that information. | + +### JSON Schema Extensions + +In order to allow a Tap to indicate which fields are selectable and to +allow the user to make their selections, we extend JSON Schema by adding +two additional properties. Note that since JSON Schema is recursive, these +properties may appear on the top-level schema or on properties within the +schema: + +* `inclusion`: Either `available`, `automatic`, or `unsupported`. + + * `"available"` means that the field is available for selection, and that + the Tap will only emit values for that field if it is marked with + `"selected": true`. + * `"automatic"` means that the Tap may emit values for the field, but it + is not up to the user to select it. + * `"unsupported"` means that the field exists in the source data but the + Tap is unable to provide it. +* `selected`: Either `true` or `false`. For a top-level schema, `true` + indicates that the stream should be synced, and `false` indicates it + should be omitted entirely. For a property within a stream, `true` + means include the property, `false` means leave it out. + +Here's an example of a discovered catalog + +```javascript +{ + "streams": [ + { + "stream": "users", + "tap_stream_id": "users", + "schema": { + "type": "object", + "properties": { + "id": { + "type": "integer", + "inclusion": "automatic" + }, + "name": { + "type": "object", + "inclusion": "available", + "properties": { + "first_name": {"type": "string", "inclusion": "available"}, + "last_name": {"type": "string", "inclusion": "available"} + }, + }, + "addresses": { + "type": "array", + "inclusion": "available", + "items": { + "type": "object", + "inclusion": "available", + "properties": { + "addr1": {"type": "string", "inclusion": "available"}, + "addr2": {"type": "string", "inclusion": "available"}, + "city": {"type": "string", "inclusion": "available"}, + "state": {"type": "string", "inclusion": "available"}, + "zip": {"type": "string", "inclusion": "available"}, + } + } + } + } + } + }, + { + "stream": "orders", + "tap_stream_id": "orders", + "schema": { + "type": "object", + "properties": { + "id": {"type": "integer"}, + "user_id": {"type": "integer", "inclusion": "available"}, + "amount": {"type": "number", "inclusion": "available"}, + "credit_card_number": {"type": "string", "inclusion": "available"}, + } + } + } + ] +} +``` + +### Discovery Mode + +A Tap that wants to support property selection should add an optional +`--discover` flag. When the `--discover` flag is supplied, the Tap +should connect to its data source, find the list of streams available, +and print out the catalog to stdout. The discovery output should go to +STDOUT, and it should be the only thing written to STDOUT. If the +`--discover` flag is supplied, a tap should not emit any RECORD, +SCHEMA, or STATE messages. + +### Sync Mode + +A tap that supports property selection should accept an optional +`--catalog CATALOG` option. `CATALOG` should point to a file containing +the catalog, annotated with the user's "selected" choices. + +The Tap should attempt to sync every stream that is listed in the +PROPERTIES file where the "selected" property of the stream's schema is +`true`. For each of those streams, it should include all the properties +that are marked as selected for that stream. If the requested schema +contains a property that does not exist in the data source, a Tap may fail +and exit with a non-zero status or it may omit the requested field from +the output. The Tap may include additional properties that are not +included in the catalog, if those properties are always provided by the +data source. The tap should not include in its output any streams that are +not present in the catalog. + +Metrics +------- + +A Tap should periodically emit structured log messages containing metrics +about read operations. Consumers of the Tap logs can parse these metrics +out of the logs for monitoring or analysis. + +``` +INFO METRIC: +``` + +`` should be a JSON object with the following keys; + +* `type` - The type of the metric. Indicates how consumers of the data + should interpret the `value` field. There are two types of metrics: + + * `counter` - The value should be interpreted as a number that is added + to a cumulative or running total. + + * `timer` - The value is the duration in seconds of some operation. + +* `metric` - The name of the metric. This should consist only of letters, + numbers, underscore, and dash characters. For example, + `"http_request_duration"`. + +* `value` - The value of the datapoint, either an integer or a float. For + example, `1234` or `1.234`. + +* `tags` - Mapping of tags describing the data. The keys can be any + strings consisting solely of letters, numbers, underscores, and dashes. + For consistency's sake, we recommend using the following tags when they + are relevant. + + * `endpoint` - For a Tap that pulls data from an HTTP API, this should + be a descriptive name for the endpoint, such as `"users"` or `"deals"` + or `"orders"`. + + * `http_status_code` - The HTTP status code. For example, `200` or + `500`. + + * `job_type` - For a process that we are timing, some description of + the type of the job. For example, if we have a Tap that does a POST + to an HTTP API to generate a report and then polls with a GET until + the report is done, we could use a job type of `"run_report"`. + + * `status` - Either `"succeeded"` or `"failed"`. + + Note that for many metrics, many of those tags will _not_ be relevant. + +### Examples + +Here are some examples of metrics and how those metrics should be +interpreted. + +#### Example 1: Timer for Successful HTTP GET + +``` +INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 1.23, "tags": {"endpoint": "orders", "http_status_code": 200, "status": "succeeded"}} +``` + +> We made an HTTP request to an "orders" endpoint that took 1.23 seconds +> and succeeded with a status code of 200. + +#### Example 2: Timer for Failed HTTP GET + +``` +INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 30.01, "tags": {"endpoint": "orders", "http_status_code": 500, "status": "failed"}} +``` + +> We made an HTTP request to an "orders" endpoint that took 30.01 seconds +> and failed with a status code of 500. + +#### Example 3: Counter for Records + +``` +INFO METRIC: {"type": "counter", "metric": "record_count", "value": 100, "tags": {"endpoint: "orders"}} +INFO METRIC: {"type": "counter", "metric": "record_count", "value": 100, "tags": {"endpoint: "orders"}} +INFO METRIC: {"type": "counter", "metric": "record_count", "value": 100, "tags": {"endpoint: "orders"}} +INFO METRIC: {"type": "counter", "metric": "record_count", "value": 14, "tags": {"endpoint: "orders"}} + +``` + +> We fetched a total of 314 records from an "orders" endpoint. diff --git a/07_SPEC.md b/07_SPEC.md new file mode 100644 index 0000000..abf8aff --- /dev/null +++ b/07_SPEC.md @@ -0,0 +1,259 @@ +# Singer Specification + +### Version 0.1 + +A *Tap* is an application that takes a *configuration* file and an +optional *state* file as input and produces an ordered stream of *record*, +*state* and *schema* messages as output. A *record* is json-encoded data +of any kind. A *state* message is used to persist information between +invocations of a Tap. A *schema* message describes the datatypes of +the *record*s in the stream. A Tap may be implemented in any +programming language. + +Taps are designed to produce a stream of data from sources like +databases and web service APIs for use in a data integration or ETL +pipeline. + +## Synopsis + +``` +tap --config CONFIG [--state STATE] + +CONFIG is a required argument that points to a JSON file containing any +configuration parameters the Tap needs. + +STATE is an optional argument pointing to a JSON file that the +Tap can use to remember information from the previous invocation, +like, for example, the point where it left off. + +``` + +## Input + +### Configuration + +The configuration contains whatever parameters the Tap needs in order +to pull data from the source. Typically this will include the credentials +for the API or data source. + +#### Special Fields + +`start_date` should be used on first sync to indicate how far back to grab +records. Start dates should conform to the +[RFC3339](https://www.ietf.org/rfc/rfc3339.txt) specification. + +`user_agent` should be set to something that includes a contact email +address should the API provider need to contact you for any reason. +(e.g. "Stitch (+support@stitchdata.com)") + +#### Examples + +The format of the configuration will vary by Tap, but it must be +JSON-encoded and the root of the configuration must be an object. For +many sources, the configuration may just be a single value like an API +key. This should still be encoded as JSON. For example: + +```json +{ + "api_key" : "ABC123ASDF5432", + "start_date" : "2017-01-01T00:00:00Z", + "user_agent" : "Stitch (+support@stitchdata.com)" +} +``` + +### State + +The state is used to persist information between invocations of a +Tap. The state must be encoded in JSON, but beyond that the +format of the state is determined wholely by the Tap. A Tap +that wishes to persist state should periodically write STATE messages +to stdout as it processes the stream, and should expect the file named +by the `--state STATE` argument to have the same format as the value +of the STATE messages it emits. + +A common use case of state is to record the spot in the stream where the +last invocation left off. For this use case, the state will typically +contain values like timestamps that correspond to "last-updated-at" +fields from the source. If the Tap is invoked without a `--state +STATE` argument, it should start at the beginning of the stream or at some +appropriate default position. If it is invoked with a `--state STATE` +argument it should read in the state file and start from the corresponding +position in the stream. + +### Example invocations + +Sync from the beginning + +```bash +$ ./tap --config config.json +``` + +Sync starting from a stored state + +```bash +$ ./tap --config config.json --state state.json +``` + +## Output + +A Tap outputs structured messages to `stdout` in JSON format, one +message per line. Logs and other information can be emitted to `stderr` +for aiding debugging. A streamer exits with a zero exit code on success, +non-zero on failure. + +The body contains messages encoded as a JSON map, one message per +line. Each message must contain a `type` attribute. Any message `type` +is permitted, and `type`s are interpreted case-insensitively. The +following `type`s have specific meaning: + +### RECORD + +RECORD messages contain the data from the data stream. They must have +the following properties: + + - `record` **Required**. A JSON map containing a streamed data point + + - `stream` **Required**. The string name of the stream + +A single Tap may output RECORDs messages with different stream +names. A single RECORD entry may only contain records for a single +stream. + +Example: + +```json +{"type": "RECORD", "stream": "users", "record": {"id": 0, "name": "Chris"}} +``` + +### SCHEMA + +SCHEMA messages describe the datatypes of data in the stream. They +must have the following properties: + + - `schema` **Required**. A [JSON Schema] describing the + `data` property of RECORDs from the same `stream` + + - `stream` **Required**. The string name of the stream that this + schema describes + + - `key_properties` **Required**. A list of strings indicating which + properties make up the primary key for this stream. Each item in the + list must be the name of a top-level property defined in the schema. A + value for `key_properties` must be provided, but it may be an empty + list to indicate that there is no primary key. + +A single Tap may output SCHEMA messages with different stream +names. If a RECORD message from a stream is not preceded by a +`SCHEMA` message for that stream, it is assumed to be schema-less. + +Example: + +```json +{"type": "SCHEMA", + "stream": "users", + "schema": {"properties":{"id":{"type":"integer"}}}, "record": {"id": 0, "name": "Chris"}, + "key_properties": ["id"]} +``` + +### STATE + +STATE messages contain the state that the Tap wishes to persist. +STATE messages have the following properties: + + + - `value` **Required**. The JSON formatted state value + +The semantics of a STATE value are not part of the specification, +and should be determined independently by each Tap. + +## Example: + +``` +{"type": "SCHEMA", "stream": "users", "key_properties": ["id"], "schema": {"required": ["id"], "type": "object", "properties": {"id": {"type": "integer"}}}} +{"type": "RECORD", "stream": "users", "record": {"id": 1, "name": "Chris"}} +{"type": "RECORD", "stream": "users", "record": {"id": 2, "name": "Mike"}} +{"type": "SCHEMA", "stream": "locations", "key_properties": ["id"], "schema": {"required": ["id"], "type": "object", "properties": {"id": {"type": "integer"}}}} +{"type": "RECORD", "stream": "locations", "record": {"id": 1, "name": "Philadelphia"}} +{"type": "STATE", "value": {"users": 2, "locations": 1}} +``` + +## Versioning + +A Tap's API encompasses its input and output - including its +configuration, how it interprets state, and how the data it +produces is structured and interpreted. Taps should follow +[Semantic Versioning], meaning that breaking changes to any of +these should be a new MAJOR version, and backwards-compatible changes +should be a new MINOR version. + +[JSON Schema]: http://json-schema.org/ "JSON Schema" +[Semantic Versioning]: http://semver.org/ "Semantic Versioning" + + + +# Data Types and Schemas + +JSON is used to represent data because it is ubiquitous, readable, and +especially appropriate for the large universe of sources that expose data +as JSON like web APIs. However, JSON is far from perfect: + + - it has a limited type system, without support for common types like + dates, and no distinction between integers and floating point numbers + + - while its flexibility makes it easy to use, it can also cause + compatibility problems + +*Schemas* are used to solve these problems. Generally speaking, a schema +is anything that describes how data is structured. In Streams, schemas are +written by streamers in *SCHEMA* messages, formatted following the +[JSON Schema] spec. + +Schemas solve the limited data types problem by providing more information +about how to interpret JSON's basic types. For example, the [JSON Schema] +spec distinguishes between `integer` and `number` types, where the latter +is appropriately interpretted as a floating point. Additionally, it +defines a string format called `date-time` that can be used to indicate +when a data point is expected to be a +[properly formatted](https://tools.ietf.org/html/rfc3339) timestamp +string. + +Schemas mitigate JSON's compatibility problem by providing an easy way to +validate the structure of a set of data points. Streams deploys this +concept by encouraging use of only a single schema for each substream, and +validating each data point against its schema prior to persistence. This +forces the streamer author to think about how to resolve schema evolution +and compatibility questions, placing that responsibility as close to the +original data source as possible, and freeing downstream systems from +making uninformed assumptions to resolve these issues. + +Schemas are required, but they can be defined in the broadest terms - a +JSON Schema of '{}' validates all data points. However, it is a best +practice for streamer authors to define schemas as narrowly as possible. + +## Schemas in Stitch + +The Stitch persister and Stitch API use schemas as follows: + + - the Stitch persister fails when it encounters a data point that doesn't + validate against its stream's latest schema + - schemas must be an 'object' at the top level + - Stitch supports schemas with objects nested to any depth, and arrays of + objects nested to any depth - more info in the + [Stitch docs](https://www.stitchdata.com/docs/data-structure/nested-data-structures-row-count-impact) + - properties of type `string` and format `date-time` are converted to + the appropriate timestamp or datetime type in the destination database + - properties of type `integer` are converted to integer in the destination + database + - properties of type `number` are converted to decimal or numeric in the + destination database + - (soon) the `maxLength` parameter of a property of type `string` is used + to define the width of the corresponding varchar column in the + destination database + - when Stitch encounters a schema for a stream that is incompatible with + the table that stream is to be loaded into in the destination database, + it adds the data to the + [reject pile](https://www.stitchdata.com/docs/data-structure/identifying-rejected-records) + + +[JSON Schema]: http://json-schema.org/ + diff --git a/08_PROPOSALS.md b/08_PROPOSALS.md new file mode 100644 index 0000000..415eebb --- /dev/null +++ b/08_PROPOSALS.md @@ -0,0 +1,96 @@ +# Proposals + +These are changes to the spec that may be added in the future. + +## Add structure detection and selection + +### Use case + +For some data sources, it would be desirable to provide a more where the +tap can detect what tables or fields are available, and to allow a user to +select which fields to retrieve. + +### Solution + +Add an optional `--discover` flag that causes the tap to print out a +"STRUCTURE" message describing the data structures that are available to +the tap. Add a `--structure STRUCTURE` option that indicates that the tap +should read the STRUCTURE file in and use that to determine which data structures to sync. + +Support for these two options should be optional, and a tap should fail if +it is invoked with options it doesn't support. + +#### Structure File + +The structure is an optional configuration file that can be used to +configure which data structures the streamer captures in sync +mode. The structure must be encoded as JSON following the same format +as STRUCTURE messages, defined below, with an additional "is_synced" +key on every node. If the "is_synced" value of a node is true, the +streamer should emit the data corresponding to that structure node +during sync, and it should not emit the data otherwise. A streamer +that supports structure specification should fail if an invalid +structure is supplied. + +#### Structure Message + +STRUCTURE messages describe the data available to the streamer. They +must have the following properties: + + - `structure` **Required**. A JSON object with string keys naming the + top of the structure hierarchy, and values as objects with keys: + - type: a string containing one of "database", "table", "column" or "other" + - description: a string with descriptive information about the node + - supported: a boolean indicating whether or not the streamer can sync this node + - children: a map containing the next lower level of structure, formatted in the same way as the root + +A STRUCTURE message should describe the entirety of the structure +available to the streamer. + +Example, for a database: + +```json +{"type": "STRUCTURE", + "structure": { + "my_first_database": { + "type": "database", + "description": "", + "supported": true, + "children": { + "my_first_databases_first_table": { + "type": "table", + "description": "", + "supported": true, + "children": { + "id_column": { + "type": "column", + "description": "INT PRIMARY KEY", + "supported": true + }, + "string_column": { + "type": "column", + "description": "VARCHAR(255)", + "supported": true + } + } + } + } + } + } +} +``` + +## Add connection check support + +### Use case + +For some sources it might be good to provide an option to quickly test +whether the authentication details provided in the config file are valid. +This would allow a user to determine whether they can authenticate without +actually initiating a sync. + +### Solution + +Add an optional `--check` flag. If the `--check` flag is provided, the tap +should attempt to authenticate, then print the details of the +authentication attempt to stderr and exit 0 on success or 1 on failure. From 94753ab6e4ef404f141f55eebc060eac2fb02691 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:17:20 -0400 Subject: [PATCH 06/25] Delete SPEC.md --- SPEC.md | 190 -------------------------------------------------------- 1 file changed, 190 deletions(-) delete mode 100644 SPEC.md diff --git a/SPEC.md b/SPEC.md deleted file mode 100644 index 7961874..0000000 --- a/SPEC.md +++ /dev/null @@ -1,190 +0,0 @@ -# Singer Specification - -### Version 0.1 - -A *Tap* is an application that takes a *configuration* file and an -optional *state* file as input and produces an ordered stream of *record*, -*state* and *schema* messages as output. A *record* is json-encoded data -of any kind. A *state* message is used to persist information between -invocations of a Tap. A *schema* message describes the datatypes of -the *record*s in the stream. A Tap may be implemented in any -programming language. - -Taps are designed to produce a stream of data from sources like -databases and web service APIs for use in a data integration or ETL -pipeline. - -## Synopsis - -``` -tap --config CONFIG [--state STATE] - -CONFIG is a required argument that points to a JSON file containing any -configuration parameters the Tap needs. - -STATE is an optional argument pointing to a JSON file that the -Tap can use to remember information from the previous invocation, -like, for example, the point where it left off. - -``` - -## Input - -### Configuration - -The configuration contains whatever parameters the Tap needs in order -to pull data from the source. Typically this will include the credentials -for the API or data source. - -#### Special Fields - -`start_date` should be used on first sync to indicate how far back to grab -records. Start dates should conform to the -[RFC3339](https://www.ietf.org/rfc/rfc3339.txt) specification. - -`user_agent` should be set to something that includes a contact email -address should the API provider need to contact you for any reason. -(e.g. "Stitch (+support@stitchdata.com)") - -#### Examples - -The format of the configuration will vary by Tap, but it must be -JSON-encoded and the root of the configuration must be an object. For -many sources, the configuration may just be a single value like an API -key. This should still be encoded as JSON. For example: - -```json -{ - "api_key" : "ABC123ASDF5432", - "start_date" : "2017-01-01T00:00:00Z", - "user_agent" : "Stitch (+support@stitchdata.com)" -} -``` - -### State - -The state is used to persist information between invocations of a -Tap. The state must be encoded in JSON, but beyond that the -format of the state is determined wholely by the Tap. A Tap -that wishes to persist state should periodically write STATE messages -to stdout as it processes the stream, and should expect the file named -by the `--state STATE` argument to have the same format as the value -of the STATE messages it emits. - -A common use case of state is to record the spot in the stream where the -last invocation left off. For this use case, the state will typically -contain values like timestamps that correspond to "last-updated-at" -fields from the source. If the Tap is invoked without a `--state -STATE` argument, it should start at the beginning of the stream or at some -appropriate default position. If it is invoked with a `--state STATE` -argument it should read in the state file and start from the corresponding -position in the stream. - -### Example invocations - -Sync from the beginning - -```bash -$ ./tap --config config.json -``` - -Sync starting from a stored state - -```bash -$ ./tap --config config.json --state state.json -``` - -## Output - -A Tap outputs structured messages to `stdout` in JSON format, one -message per line. Logs and other information can be emitted to `stderr` -for aiding debugging. A streamer exits with a zero exit code on success, -non-zero on failure. - -The body contains messages encoded as a JSON map, one message per -line. Each message must contain a `type` attribute. Any message `type` -is permitted, and `type`s are interpreted case-insensitively. The -following `type`s have specific meaning: - -### RECORD - -RECORD messages contain the data from the data stream. They must have -the following properties: - - - `record` **Required**. A JSON map containing a streamed data point - - - `stream` **Required**. The string name of the stream - -A single Tap may output RECORDs messages with different stream -names. A single RECORD entry may only contain records for a single -stream. - -Example: - -```json -{"type": "RECORD", "stream": "users", "record": {"id": 0, "name": "Chris"}} -``` - -### SCHEMA - -SCHEMA messages describe the datatypes of data in the stream. They -must have the following properties: - - - `schema` **Required**. A [JSON Schema] describing the - `data` property of RECORDs from the same `stream` - - - `stream` **Required**. The string name of the stream that this - schema describes - - - `key_properties` **Required**. A list of strings indicating which - properties make up the primary key for this stream. Each item in the - list must be the name of a top-level property defined in the schema. A - value for `key_properties` must be provided, but it may be an empty - list to indicate that there is no primary key. - -A single Tap may output SCHEMA messages with different stream -names. If a RECORD message from a stream is not preceded by a -`SCHEMA` message for that stream, it is assumed to be schema-less. - -Example: - -```json -{"type": "SCHEMA", - "stream": "users", - "schema": {"properties":{"id":{"type":"integer"}}}, "record": {"id": 0, "name": "Chris"}, - "key_properties": ["id"]} -``` - -### STATE - -STATE messages contain the state that the Tap wishes to persist. -STATE messages have the following properties: - - - - `value` **Required**. The JSON formatted state value - -The semantics of a STATE value are not part of the specification, -and should be determined independently by each Tap. - -## Example: - -``` -{"type": "SCHEMA", "stream": "users", "key_properties": ["id"], "schema": {"required": ["id"], "type": "object", "properties": {"id": {"type": "integer"}}}} -{"type": "RECORD", "stream": "users", "record": {"id": 1, "name": "Chris"}} -{"type": "RECORD", "stream": "users", "record": {"id": 2, "name": "Mike"}} -{"type": "SCHEMA", "stream": "locations", "key_properties": ["id"], "schema": {"required": ["id"], "type": "object", "properties": {"id": {"type": "integer"}}}} -{"type": "RECORD", "stream": "locations", "record": {"id": 1, "name": "Philadelphia"}} -{"type": "STATE", "value": {"users": 2, "locations": 1}} -``` - -## Versioning - -A Tap's API encompasses its input and output - including its -configuration, how it interprets state, and how the data it -produces is structured and interpreted. Taps should follow -[Semantic Versioning], meaning that breaking changes to any of -these should be a new MAJOR version, and backwards-compatible changes -should be a new MINOR version. - -[JSON Schema]: http://json-schema.org/ "JSON Schema" -[Semantic Versioning]: http://semver.org/ "Semantic Versioning" From cf2d7ddfcab710f8854a96552bcfe844f9c49467 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:17:32 -0400 Subject: [PATCH 07/25] Delete SCHEMAS.md --- SCHEMAS.md | 65 ------------------------------------------------------ 1 file changed, 65 deletions(-) delete mode 100644 SCHEMAS.md diff --git a/SCHEMAS.md b/SCHEMAS.md deleted file mode 100644 index 7dda1d3..0000000 --- a/SCHEMAS.md +++ /dev/null @@ -1,65 +0,0 @@ -# Data Types and Schemas - -JSON is used to represent data because it is ubiquitous, readable, and -especially appropriate for the large universe of sources that expose data -as JSON like web APIs. However, JSON is far from perfect: - - - it has a limited type system, without support for common types like - dates, and no distinction between integers and floating point numbers - - - while its flexibility makes it easy to use, it can also cause - compatibility problems - -*Schemas* are used to solve these problems. Generally speaking, a schema -is anything that describes how data is structured. In Streams, schemas are -written by streamers in *SCHEMA* messages, formatted following the -[JSON Schema] spec. - -Schemas solve the limited data types problem by providing more information -about how to interpret JSON's basic types. For example, the [JSON Schema] -spec distinguishes between `integer` and `number` types, where the latter -is appropriately interpretted as a floating point. Additionally, it -defines a string format called `date-time` that can be used to indicate -when a data point is expected to be a -[properly formatted](https://tools.ietf.org/html/rfc3339) timestamp -string. - -Schemas mitigate JSON's compatibility problem by providing an easy way to -validate the structure of a set of data points. Streams deploys this -concept by encouraging use of only a single schema for each substream, and -validating each data point against its schema prior to persistence. This -forces the streamer author to think about how to resolve schema evolution -and compatibility questions, placing that responsibility as close to the -original data source as possible, and freeing downstream systems from -making uninformed assumptions to resolve these issues. - -Schemas are required, but they can be defined in the broadest terms - a -JSON Schema of '{}' validates all data points. However, it is a best -practice for streamer authors to define schemas as narrowly as possible. - -## Schemas in Stitch - -The Stitch persister and Stitch API use schemas as follows: - - - the Stitch persister fails when it encounters a data point that doesn't - validate against its stream's latest schema - - schemas must be an 'object' at the top level - - Stitch supports schemas with objects nested to any depth, and arrays of - objects nested to any depth - more info in the - [Stitch docs](https://www.stitchdata.com/docs/data-structure/nested-data-structures-row-count-impact) - - properties of type `string` and format `date-time` are converted to - the appropriate timestamp or datetime type in the destination database - - properties of type `integer` are converted to integer in the destination - database - - properties of type `number` are converted to decimal or numeric in the - destination database - - (soon) the `maxLength` parameter of a property of type `string` is used - to define the width of the corresponding varchar column in the - destination database - - when Stitch encounters a schema for a stream that is incompatible with - the table that stream is to be loaded into in the destination database, - it adds the data to the - [reject pile](https://www.stitchdata.com/docs/data-structure/identifying-rejected-records) - - -[JSON Schema]: http://json-schema.org/ From 4fc21d2d13569c5459b669db661aa4edc18a7bb5 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:17:45 -0400 Subject: [PATCH 08/25] Delete README.md --- README.md | 274 ------------------------------------------------------ 1 file changed, 274 deletions(-) delete mode 100644 README.md diff --git a/README.md b/README.md deleted file mode 100644 index e4f38a1..0000000 --- a/README.md +++ /dev/null @@ -1,274 +0,0 @@ -# Getting Started with Singer - -Singer is an open source standard for moving data between databases, -web APIs, files, queues, and just about anything else you can think -of. The [Singer spec] describes how data extraction scripts β€” called -β€œTaps” β€” and data loading scripts β€” called β€œTargets” β€” should -communicate using a standard JSON-based data format over `stdout`. By -conforming to this spec, Taps and Targets can be used in any -combination to move data from any source to any destination. - -**Topics** - - - [Using Singer to populate Google Sheets](#using-singer-to-populate-google-sheets) - - [Developing a Tap](#developing-a-tap) - - [Additional Resources](#additional-resources) - -## Using Singer to populate Google Sheets - -The [Google Sheets Target] can be combined with any Singer Tap to -populate a Google Sheet with data. This example will use currency -exchange rate data from the [Fixer.io Tap]. [Fixer.io] is a free API for -current and historical foreign exchange rates published by the -European Central Bank. - -The steps are: - 1. [Activate the Google Sheets API](#step-1---activate-the-google-sheets-api) - 1. [Configure the Target](#step-2---configure-the-target) - 1. [Install](#step-3---install) - 1. [Run](#step-4---run) - 1. [Save State (optional)](#step-5---save-state-optional) - -### Step 1 - Activate the Google Sheets API - - (originally found in the [Google API - docs](https://developers.google.com/sheets/api/quickstart/python)) - - 1. Use [this - wizard](https://console.developers.google.com/start/api?id=sheets.googleapis.com) - to create or select a project in the Google Developers Console and - activate the Sheets API. Click Continue, then Go to credentials. - - 1. On the **Add credentials to your project** page, click the - **Cancel** button. - - 1. At the top of the page, select the **OAuth consent screen** - tab. Select an **Email address**, enter a **Product name** if not - already set, and click the **Save** button. - - 1. Select the **Credentials** tab, click the **Create credentials** - button and select **OAuth client ID**. - - 1. Select the application type **Other**, enter the name "Singer - Sheets Target", and click the **Create** button. - - 1. Click **OK** to dismiss the resulting dialog. - - 1. Click the Download button to the right of the client ID. - - 1. Move this file to your working directory and rename it - `client_secret.json`. - -### Step 2 - Configure the Target - -Created a file called `config.json` in your working directory, -following [config.sample.json](https://github.com/singer-io/target-gsheet/blob/master/config.sample.json). The required -`spreadsheet_id` parameter is the value between the "/d/" and the -"/edit" in the URL of your spreadsheet. For example, consider the -following URL that references a Google Sheets spreadsheet: - -``` -https://docs.google.com/spreadsheets/d/1qpyC0XzvTcKT6EISywvqESX3A0MwQoFDE8p-Bll4hps/edit#gid=0 -``` - -The ID of this spreadsheet is -`1qpyC0XzvTcKT6EISywvqESX3A0MwQoFDE8p-Bll4hps`. - - -### Step 3 - Install - -First, make sure Python 3 is installed on your system or follow these -installation instructions for [Mac](python-mac) or -[Ubuntu](python-ubuntu). - -`target-gsheet` can be run with any [Singer Tap] to move data from -sources like [Braintree], [Freshdesk] and [Hubspot] to Google -Sheets. We'll use the [Fixer.io Tap] - which pulls currency exchange -rate data from a public data set - as an example. - -We recommend installing each Tap and Target in a separate Python virtual -environment. This will insure that you won't have conflicting dependencies -between any Taps and Targets. - -These commands will install `tap-fixerio` and `target-gsheet` with pip in -their own virtual environments: - -```bash -# Install tap-fixerio in its own virtualenv -virtualenv -p python3 tap-fixerio -tap-fixerio/bin/pip install tap-fixerio - -# Install target-gsheet in its own virtualenv -virtualenv -p python3 target-gsheet -target-gsheet/bin/pip install target-gsheet -``` - -### Step 4 - Run - -This command will pipe the output of `tap-fixerio` to `target-gsheet`, -using the configuration file created in Step 2: - -```bash -β€Ί tap-fixerio/bin/tap-fixerio | target-gsheet/bin/target-gsheet -c config.json - INFO Replicating the latest exchange rate data from fixer.io - INFO Tap exiting normally -``` - -`target-gsheet` will attempt to open a new window or tab in your -default browser to perform authentication. If this fails, copy the URL -from the console and manually open it in your browser. - -If you are not already logged into your Google account, you will be -prompted to log in. If you are logged into multiple Google accounts, -you will be asked to select one account to use for the -authorization. Click the **Accept** button to allow `target-gsheet` to -access your Google Sheet. You can close the tab after the signup flow -is complete. - -Each stream generated by the Tap will be written to a different sheet -in your Google Sheet. For the [Fixer.io Tap] you'll see a single sheet -named `exchange_rate`. - -### Step 5 - Save State (optional) - -When `target-gsheet` is run as above it writes log lines to `stderr`, -but `stdout` is reserved for outputting **State** messages. A State -message is a JSON-formatted line with data that the Tap wants -persisted between runs - often "high water mark" information that the -Tap can use to pick up where it left off on the next run. Read more -about State messages in the [Singer spec]. - -Targets write State messages to `stdout` once all data that appeared -in the stream before the State message has been processed by the -Target. Note that although the State message is sent into the target, -in most cases the target's process won't actually store it anywhere or -do anything with it other than repeat it back to `stdout`. - -Taps like the [Fixer.io Tap] can also accept a `--state` argument -that, if present, points to a file containing the last persisted State -value. This enables Taps to work incrementally - the State -checkpoints the last value that was handled by the Target, and the -next time the Tap is run it should pick up from that point. - -To run the [Fixer.io Tap] incrementally, point it to a State file and -capture the persister's `stdout` like this: - -```bash -β€Ί tap-fixerio --state state.json | target-gsheet -c config.json >> state.json -β€Ί tail -1 state.json > state.json.tmp && mv state.json.tmp state.json -(rinse and repeat) -``` - -## Developing a Tap - -If you can't find an existing Tap for your data source, then it's time -to build your own. - -**Topics**: - - [Hello, world](#hello-world) - - [A Python Tap](#a-python-tap) - -### Hello, world - -A Tap is just a program, written in any language, that outputs data to -`stdout` according to the [Singer spec]. In fact, your first Tap can -be written from the command line, without any programming at all: - -```bash -β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' -``` - -This writes the datapoint `{"value":"world"}` to the *hello* -stream along with a schema indicating that `value` is a string. -That data can be piped into any Target, like the [Google Sheets -Target], over `stdin`: - -```bash -β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' | target-gsheet -c config.json -``` - -### A Python Tap - -To move beyond *Hello, world* you'll need a real programming language. -Although any language will do, we have built a Python library to help -you get up and running quickly. - -Let's write a Tap called `tap_ip.py` that retrieves the current - IP using icanhazip.com, and writes that data with a timestamp. - -First, install the [Singer helper library] with `pip`: - -```bash -β€Ί pip install singer-python -``` - -Then, open up a new file called `tap_ip.py` in your favorite editor. - -```python -import singer -import urllib.request -from datetime import datetime, timezone -``` - -We'll use the `datetime` module to get the current timestamp, the -`singer` module to write data to `stdout` in the correct format, and -the `urllib.request` module to make a request to icanhazip.com. - -```python -now = datetime.now(timezone.utc).isoformat() -schema = { - 'properties': { - 'ip': {'type': 'string'}, - 'timestamp': {'type': 'string', 'format': 'date-time'}, - }, -} - -``` - -This sets up some of the data we'll need - the current time, and the -schema of the data we'll be writing to the stream formatted as a [JSON -Schema]. - -```python -with urllib.request.urlopen('http://icanhazip.com') as response: - ip = response.read().decode('utf-8').strip() - singer.write_schema('my_ip', schema, 'timestamp') - singer.write_records('my_ip', [{'timestamp': now, 'ip': ip}]) -``` - -Finally, we make the HTTP request, parse the response, and then make -two calls to the `singer` library: - - - `singer.write_schema` which writes the schema of the `my_ip` stream and defines its primary key - - `singer.write_records` to write a record to that stream - -We can send this data to Google Sheets by running our new Tap -with the [Google Sheets Target]: - -```bash -β€Ί python tap_ip.py | target-gsheet -c config.json -``` - -## Additional Resources - -Join the [Singer Slack channel] to get help from members of the Singer -community. - ---- - -Copyright © 2017 Stitch - -[Singer spec]: SPEC.md -[Singer Tap]: https://singer.io -[Braintree]: https://github.com/singer-io/tap-braintree -[Freshdesk]: https://github.com/singer-io/tap-freshdesk -[Hubspot]: https://github.com/singer-io/tap-hubspot -[Fixer.io Tap]: https://github.com/singer-io/tap-fixerio -[Fixer.io]: http://fixer.io -[python-mac]: http://docs.python-guide.org/en/latest/starting/install3/osx/ -[python-ubuntu]: https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04 -[Google Sheets Target]: https://github.com/singer-io/target-gsheet -[Singer helper library]: https://github.com/singer-io/singer-python -[JSON Schema]: http://json-schema.org/ -[Singer Slack channel]: https://singer-slackin.herokuapp.com/ - From 8695f4d96d568206643e97dddf904f3799ebeefd Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:17:55 -0400 Subject: [PATCH 09/25] Delete BEST_PRACTICES.md --- BEST_PRACTICES.md | 425 ---------------------------------------------- 1 file changed, 425 deletions(-) delete mode 100644 BEST_PRACTICES.md diff --git a/BEST_PRACTICES.md b/BEST_PRACTICES.md deleted file mode 100644 index 2bbffa8..0000000 --- a/BEST_PRACTICES.md +++ /dev/null @@ -1,425 +0,0 @@ -Best Practices for Building a Singer Tap -============================================ - -Language --------- - -Currently all Singer Taps are written in Python. This Best Practices guide -contains some python-specific recommendations. - -Recommended Config Fields -------------------------- - -The Singer Specification does not define any standard fields in the -configuration. We recommend supporting the following fields: - -* `start_date` - A tap should take a `start_date` field in the config that - Indicates how far back an integration should sync data in the absence of - a state entry. - -* `user_agent` - A tap should accept a `user_agent` field in the config - and pass it along in request headers to the API. The caller of a Tap - should include an email address in the `user_agent` field to allow the - API provider to contact you if there's an issue with the tap or your - usage of their API. - -Rate Limiting -------------- - -Most APIs enforce rate limits. Taps should be written in a way that respects these rate limits. - -Rate limits can take the form of quotas (X requests per day), in which case the tap should leave -room for other use of the API, or short-term limits (X requests per Y seconds). For short-term -limits, the entire quota can be used up and the tap should sleep when rate limited. - -The singer-python library's utils namespace has a rate limiting decorator for use. - -Memory Constraints ------------------- - -Taps should not rely on large volumes of RAM being available during run and should strive to keep -as little data in memory as required. Use iterators/generators when available. - - -Dates ------ - -All dates should use the RFC3339 format (which includes timezone offsets). UTC is the preferred -timezone for all data and should be used when possible. - -Good: - - 2017-01-01T00:00:00Z (January 1, 2017 12:00:00AM UTC) - - 2017-01-01T00:00:00-05:00 (January 1, 2017, 12:00:00AM America/New_York) - -Bad: - - 2017-01-01 00:00:00 - -State ------ - -States that are datetimes will all be in RFC3339 format. Using ids or other identifiers for state -is possible but could cause issues if the entities are not immutable and change out of order. If -the API supports filtering by created timestamps but is not immutable (meaning a record can be -updated after creation), DO NOT use the created timestamp for saving state. States using created -timestamps or ids are a sure way to have data discrepancies. - -Write state as early and often as possible, but no sooner and no more often than is required. When a -state is written, no data prior to that state will be synced, so do not update the state until all -possible exceptions will be thrown. - -Endpoints that do not support filtering by updated timestamps or include updated timestamps do not -support saving state. - -If the API supports filtering by updated timestamp, use that for filtering. If the API doesn't -support filtering but does return updated timestamps with the data, filter by the timestamp before -streaming the data. - -When streaming data in, stream the data in ascending order when possible. - -Keep in mind that jobs can be interrupted at any point. States should never be in an invalid -state. Interrupted jobs that save state too early will have data missing. Interrupted jobs that -save state too late will cause an increase in duplicate rows being replicated. - -The Tap's config file can include a `start_date` field indicating the default state. - -### Stream bookmark format - -State records serve to indicate a Tap's progress through a data source, but they can also provide -more granular information about progress through individual streams. - -If a Tap's streams can each have an individual state, the state output by the Tap should conform to -the following format: the state object contains a top-level property `"bookmarks"` that maps to an -object. The bookmarks object contains stream identifiers as property names, each of which maps to an -object describing the respective stream's state. As a concrete example: - -``` -{ - "bookmarks": { - "orders": { - "last_record": "2017-07-07T10:20:00Z" - }, - "customers": { - "last_record": 123 - } - } -} -``` - -In the above example, there are two streams that have been bookmarked: `"orders"` and `"customers"`. -Each of those streams has some replication key field, like an `updated_at` timestamp or a sequential -ID for append-only sources. The replication key value persisted as part of the state message allows -the Tap to: - -- bookmark its progress through the stream -- serve as a query parameter upon subsequent invocations, (e.g. the equivalent of `SELECT * from - customers where id >= 123`) - -The state record above indicates that the Tap last output an entry from the `"orders"` stream where -the source record's replication key field had the value "2017-07-07T10:20:00Z", and that it last -output an entry from the `"customers"` stream where the source record's replication key field had -the value 123. - - -Logging and Exception Handling ------------------------------- - -Log every URL + params that will be requested, but be sure to exclude any sensitive information -such as api keys and client secrets. Log the progress of the sync (e.g. Starting entity 1, -Starting entity 2, etc.) When the API returns with an error, log the status code and body. - -Allow exceptions to be bubbled up and interrupt the tap. DO NOT wrap the tap code in one large -try/except block and log the exception message. The stack trace is much more useful than the error -message. - -If an intermittent error is detected from the API, retry using an exponential backoff (try using -`backoff` library for Python). If an unrecoverable error is detected, exit the script with a -non-zero error code (raise an exception or use `sys.exit(1)`) - - -Module Structure ----------------- - -Source code should be in a module (folder with `__init__.py` file) and not just a script (`module.py` -file). - - -Schemas -------- - -Schemas should be stored in a `schemas` folder under the module directory -as JSON files rather than as native dicts in the source code. - -Always stream the entity's schema before streaming any records for that -entity. - -If the API you're using does not publish a schema, you can use the -`singer-infer-schema` program in [singer-tools] to generate a base schema -to start from. - - -Testing -------- - -Use `singer-tools`'s `singer-check-tap` command to validate that a tap's -output matches the expected format and validate the output against its -schema. - -Code Quality ------------- - -We recommend running pylint on your Tap and making sure that you -understand any issues it reports. You should get your code to the point -where pylint does not report any error-level messages. - -When we import your Tap, we'll run it in CircleCI, and we'll include -Pylint in the test phase. The CircleCI buld will fail if Pylint finds any -issues, so we will need to account for all issues by either fixing them or -disabling the message in Pylint. We are flexible about which Pylint issues -are acceptable, but we generally run Pylint with some of the more pedantic -messages disabled. For example, we typically use the following circle.yml -config: - -```yaml -machine: - python: - version: 3.4.4 - -dependencies: - pre: - - pip install pylint - -test: - post: - - pylint tap_outbrain -d missing-docstring -d logging-format-interpolation -d too-many-locals -d too-many-arguments -``` - -Allowing Users to Select Streams to Sync ----------------------------------------- - -For some data sources, it won't make sense to pull every stream and -property available. For example, suppose we had a Tap for a Postgres -database, and a user only wanted to pull a subset of columns from a -subset of tables. It would be inconvenient if the Tap emitted data -from all tables and columns. - -To address this, we recommend extending the Tap to use a document called a -_catalog_. A _catalog_ is a mechanism for a Tap to indicate what streams -it makes available, and for the user to select certain streams and -properties within those streams. - -A Tap that supports a catalog should provide two additional options: - -* `--discover` - indicates that the Tap should not sync data, but should - just write its catalog to stdout. - -* `--catalog CATALOG` - the Tap should sync data, based on the selections - made in the provided CATALOG file. - -### Catalog Format - -The format of the catalog is as follows. The top level is an object, -with a single key called `"streams"` that points to an array of -objects, each having the following fields: - -| Property | type | required? | Description | -| ----------------- |--------------------|-----------|--------------------------------| -| `tap_stream_id` | string | required | The unique identifier for the stream. This is allowed to be different from the name of the stream in order to allow for sources that have duplicate stream names. | -| `stream` | string | required | The name that will be used for the stream in the data produced by this Tap. | -| `key_properties` | array of strings | optional | List of key properties. | -| `schema` | object | required | The JSON schema for the stream. | -| `replication_key` | string | optional | The name of a property in the source to use as a "bookmark". For example, this will often be an "updated-at" field or an auto-incrementing primary key. | -| `is_view` | boolean | optional | For a database source, indicates that the source is a view. | -| `database_name` | string | optional | For a database source, the name of the database. | -| `table_name` | string | optional | For a database source, the name of the table. | -| `row_count` | integer | optional | The number of rows in the source data, for taps that have access to that information. | - -### JSON Schema Extensions - -In order to allow a Tap to indicate which fields are selectable and to -allow the user to make their selections, we extend JSON Schema by adding -two additional properties. Note that since JSON Schema is recursive, these -properties may appear on the top-level schema or on properties within the -schema: - -* `inclusion`: Either `available`, `automatic`, or `unsupported`. - - * `"available"` means that the field is available for selection, and that - the Tap will only emit values for that field if it is marked with - `"selected": true`. - * `"automatic"` means that the Tap may emit values for the field, but it - is not up to the user to select it. - * `"unsupported"` means that the field exists in the source data but the - Tap is unable to provide it. -* `selected`: Either `true` or `false`. For a top-level schema, `true` - indicates that the stream should be synced, and `false` indicates it - should be omitted entirely. For a property within a stream, `true` - means include the property, `false` means leave it out. - -Here's an example of a discovered catalog - -```javascript -{ - "streams": [ - { - "stream": "users", - "tap_stream_id": "users", - "schema": { - "type": "object", - "properties": { - "id": { - "type": "integer", - "inclusion": "automatic" - }, - "name": { - "type": "object", - "inclusion": "available", - "properties": { - "first_name": {"type": "string", "inclusion": "available"}, - "last_name": {"type": "string", "inclusion": "available"} - }, - }, - "addresses": { - "type": "array", - "inclusion": "available", - "items": { - "type": "object", - "inclusion": "available", - "properties": { - "addr1": {"type": "string", "inclusion": "available"}, - "addr2": {"type": "string", "inclusion": "available"}, - "city": {"type": "string", "inclusion": "available"}, - "state": {"type": "string", "inclusion": "available"}, - "zip": {"type": "string", "inclusion": "available"}, - } - } - } - } - } - }, - { - "stream": "orders", - "tap_stream_id": "orders", - "schema": { - "type": "object", - "properties": { - "id": {"type": "integer"}, - "user_id": {"type": "integer", "inclusion": "available"}, - "amount": {"type": "number", "inclusion": "available"}, - "credit_card_number": {"type": "string", "inclusion": "available"}, - } - } - } - ] -} -``` - -### Discovery Mode - -A Tap that wants to support property selection should add an optional -`--discover` flag. When the `--discover` flag is supplied, the Tap -should connect to its data source, find the list of streams available, -and print out the catalog to stdout. The discovery output should go to -STDOUT, and it should be the only thing written to STDOUT. If the -`--discover` flag is supplied, a tap should not emit any RECORD, -SCHEMA, or STATE messages. - -### Sync Mode - -A tap that supports property selection should accept an optional -`--catalog CATALOG` option. `CATALOG` should point to a file containing -the catalog, annotated with the user's "selected" choices. - -The Tap should attempt to sync every stream that is listed in the -PROPERTIES file where the "selected" property of the stream's schema is -`true`. For each of those streams, it should include all the properties -that are marked as selected for that stream. If the requested schema -contains a property that does not exist in the data source, a Tap may fail -and exit with a non-zero status or it may omit the requested field from -the output. The Tap may include additional properties that are not -included in the catalog, if those properties are always provided by the -data source. The tap should not include in its output any streams that are -not present in the catalog. - -Metrics -------- - -A Tap should periodically emit structured log messages containing metrics -about read operations. Consumers of the Tap logs can parse these metrics -out of the logs for monitoring or analysis. - -``` -INFO METRIC: -``` - -`` should be a JSON object with the following keys; - -* `type` - The type of the metric. Indicates how consumers of the data - should interpret the `value` field. There are two types of metrics: - - * `counter` - The value should be interpreted as a number that is added - to a cumulative or running total. - - * `timer` - The value is the duration in seconds of some operation. - -* `metric` - The name of the metric. This should consist only of letters, - numbers, underscore, and dash characters. For example, - `"http_request_duration"`. - -* `value` - The value of the datapoint, either an integer or a float. For - example, `1234` or `1.234`. - -* `tags` - Mapping of tags describing the data. The keys can be any - strings consisting solely of letters, numbers, underscores, and dashes. - For consistency's sake, we recommend using the following tags when they - are relevant. - - * `endpoint` - For a Tap that pulls data from an HTTP API, this should - be a descriptive name for the endpoint, such as `"users"` or `"deals"` - or `"orders"`. - - * `http_status_code` - The HTTP status code. For example, `200` or - `500`. - - * `job_type` - For a process that we are timing, some description of - the type of the job. For example, if we have a Tap that does a POST - to an HTTP API to generate a report and then polls with a GET until - the report is done, we could use a job type of `"run_report"`. - - * `status` - Either `"succeeded"` or `"failed"`. - - Note that for many metrics, many of those tags will _not_ be relevant. - -### Examples - -Here are some examples of metrics and how those metrics should be -interpreted. - -#### Example 1: Timer for Successful HTTP GET - -``` -INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 1.23, "tags": {"endpoint": "orders", "http_status_code": 200, "status": "succeeded"}} -``` - -> We made an HTTP request to an "orders" endpoint that took 1.23 seconds -> and succeeded with a status code of 200. - -#### Example 2: Timer for Failed HTTP GET - -``` -INFO METRIC: {"type": "timer", "metric": "http_request_duration", "value": 30.01, "tags": {"endpoint": "orders", "http_status_code": 500, "status": "failed"}} -``` - -> We made an HTTP request to an "orders" endpoint that took 30.01 seconds -> and failed with a status code of 500. - -#### Example 3: Counter for Records - -``` -INFO METRIC: {"type": "counter", "metric": "record_count", "value": 100, "tags": {"endpoint: "orders"}} -INFO METRIC: {"type": "counter", "metric": "record_count", "value": 100, "tags": {"endpoint: "orders"}} -INFO METRIC: {"type": "counter", "metric": "record_count", "value": 100, "tags": {"endpoint: "orders"}} -INFO METRIC: {"type": "counter", "metric": "record_count", "value": 14, "tags": {"endpoint: "orders"}} - -``` - -> We fetched a total of 314 records from an "orders" endpoint. From e36a4061d12224f9af66c22ed831b2ef817fa135 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:18:02 -0400 Subject: [PATCH 10/25] Delete PROPOSALS.md --- PROPOSALS.md | 96 ---------------------------------------------------- 1 file changed, 96 deletions(-) delete mode 100644 PROPOSALS.md diff --git a/PROPOSALS.md b/PROPOSALS.md deleted file mode 100644 index 415eebb..0000000 --- a/PROPOSALS.md +++ /dev/null @@ -1,96 +0,0 @@ -# Proposals - -These are changes to the spec that may be added in the future. - -## Add structure detection and selection - -### Use case - -For some data sources, it would be desirable to provide a more where the -tap can detect what tables or fields are available, and to allow a user to -select which fields to retrieve. - -### Solution - -Add an optional `--discover` flag that causes the tap to print out a -"STRUCTURE" message describing the data structures that are available to -the tap. Add a `--structure STRUCTURE` option that indicates that the tap -should read the STRUCTURE file in and use that to determine which data structures to sync. - -Support for these two options should be optional, and a tap should fail if -it is invoked with options it doesn't support. - -#### Structure File - -The structure is an optional configuration file that can be used to -configure which data structures the streamer captures in sync -mode. The structure must be encoded as JSON following the same format -as STRUCTURE messages, defined below, with an additional "is_synced" -key on every node. If the "is_synced" value of a node is true, the -streamer should emit the data corresponding to that structure node -during sync, and it should not emit the data otherwise. A streamer -that supports structure specification should fail if an invalid -structure is supplied. - -#### Structure Message - -STRUCTURE messages describe the data available to the streamer. They -must have the following properties: - - - `structure` **Required**. A JSON object with string keys naming the - top of the structure hierarchy, and values as objects with keys: - - type: a string containing one of "database", "table", "column" or "other" - - description: a string with descriptive information about the node - - supported: a boolean indicating whether or not the streamer can sync this node - - children: a map containing the next lower level of structure, formatted in the same way as the root - -A STRUCTURE message should describe the entirety of the structure -available to the streamer. - -Example, for a database: - -```json -{"type": "STRUCTURE", - "structure": { - "my_first_database": { - "type": "database", - "description": "", - "supported": true, - "children": { - "my_first_databases_first_table": { - "type": "table", - "description": "", - "supported": true, - "children": { - "id_column": { - "type": "column", - "description": "INT PRIMARY KEY", - "supported": true - }, - "string_column": { - "type": "column", - "description": "VARCHAR(255)", - "supported": true - } - } - } - } - } - } -} -``` - -## Add connection check support - -### Use case - -For some sources it might be good to provide an option to quickly test -whether the authentication details provided in the config file are valid. -This would allow a user to determine whether they can authenticate without -actually initiating a sync. - -### Solution - -Add an optional `--check` flag. If the `--check` flag is provided, the tap -should attempt to authenticate, then print the details of the -authentication attempt to stderr and exit 0 on success or 1 on failure. From 0b98bf6400400a04c0f24ed505a2bd70ef6c4d14 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:27:21 -0400 Subject: [PATCH 11/25] Update 01_README.md --- 01_README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_README.md b/01_README.md index ce44a7b..29ee701 100644 --- a/01_README.md +++ b/01_README.md @@ -1,4 +1,4 @@ -[[https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png]] +!(https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) #πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ From f8574d464a92d41ae01d9e9c63b1cdb8261e28a6 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:27:39 -0400 Subject: [PATCH 12/25] Update 01_README.md --- 01_README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_README.md b/01_README.md index 29ee701..cadab00 100644 --- a/01_README.md +++ b/01_README.md @@ -1,4 +1,4 @@ -!(https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) +![https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png] #πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ From 773c4b1bf589dc4ebf4ef1f464dded392ae013cf Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:28:05 -0400 Subject: [PATCH 13/25] Update 01_README.md --- 01_README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/01_README.md b/01_README.md index cadab00..e7da5e7 100644 --- a/01_README.md +++ b/01_README.md @@ -1,4 +1,4 @@ -![https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png] +![Singer Logo](https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) #πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ From 4caf3cfa30d3558a1907a513703a343d459c1edf Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Tue, 18 Jul 2017 16:28:24 -0400 Subject: [PATCH 14/25] Update 01_README.md --- 01_README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/01_README.md b/01_README.md index e7da5e7..a9de1d0 100644 --- a/01_README.md +++ b/01_README.md @@ -1,7 +1,6 @@ ![Singer Logo](https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) -#πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ - +# πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ ## Singer is an open source ETL tool. In case you're unfamiliar ETL stands for Extract, Transform, and Load. It's a term used in the data warehousing world. If you happen to be moving data or needing to pull data we want to help! From 7bfd7137f0de65cf7f533afdc0f60dd6dda38766 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 11:42:55 -0500 Subject: [PATCH 15/25] updated stuff --- 01_README.md | 31 ++ 02_EXTRACT_WITH_TAPS.md | 108 +++++ 03_SEND_TO_TARGETS.md | 48 +++ 04_COOL_TAPS_CLUB.md | 67 +++ 05_CONTRIBUTIONS.md | 502 ++++++++++++++++++++++ 06_MAKE_IT_OFFICIAL.md | 18 + BEST_PRACTICES.md => 07_BEST_PRACTICES.md | 3 +- SPEC.md => 08_SPEC.md | 69 +++ PROPOSALS.md => 09_PROPOSALS.md | 0 10_CODE_OF_CONDUCT.md | 80 ++++ README.md | 274 ------------ SCHEMAS.md | 65 --- 12 files changed, 924 insertions(+), 341 deletions(-) create mode 100644 01_README.md create mode 100644 02_EXTRACT_WITH_TAPS.md create mode 100644 03_SEND_TO_TARGETS.md create mode 100644 04_COOL_TAPS_CLUB.md create mode 100644 05_CONTRIBUTIONS.md create mode 100644 06_MAKE_IT_OFFICIAL.md rename BEST_PRACTICES.md => 07_BEST_PRACTICES.md (99%) rename SPEC.md => 08_SPEC.md (66%) rename PROPOSALS.md => 09_PROPOSALS.md (100%) create mode 100644 10_CODE_OF_CONDUCT.md delete mode 100644 README.md delete mode 100644 SCHEMAS.md diff --git a/01_README.md b/01_README.md new file mode 100644 index 0000000..29ee701 --- /dev/null +++ b/01_README.md @@ -0,0 +1,31 @@ +!(https://trello-attachments.s3.amazonaws.com/58c8696247956895aea87ef2/58d2d15c8baaf0c33f36c87f/a789e41241329c5b972a6e105e954543/Screen_Shot_2017-04-27_at_3.58.09_PM.png) + +#πŸŽ‰πŸ‘‹πŸ½ Welcome to Singer: Open-source ETL πŸŽ‰πŸ‘‹πŸ½ + + +## Singer is an open source ETL tool. In case you're unfamiliar ETL stands for Extract, Transform, and Load. It's a term used in the data warehousing world. If you happen to be moving data or needing to pull data we want to help! + +### Singer sets a standard for moving data between databases, web APIs, files, queues, or just about anything else you can think of. _(Except penguins 🐧 and candy 🍬. We haven't figured out how to move those yet unfortunately.)_ + +In this documentation we'll take you through a number of scenarios. + +- 🍺 If you'd like to pull or extract data check out [TAPS](02_EXTRACT_WITH_TAPS.md) +- 🎯 If you'd like to send or load data check out [TARGETS](03_LOAD_WITH_TARGETS.md) +- πŸ“ If you want to dive in some technical goodness check out our [SPECS](07_SPEC.md) +- πŸ˜Žβœ… Once you've created your own tap or target be sure to let us know and join our kool kids [UNOFFICIAL](04_COOL_UNOFFICIAL_CLUB.md) club or learn how to submit to be part of the super cool [OFFICIAL](05_MAKE_IT_OFFICIAL.md) integrations. +- πŸ’― Check this out to learn more about [BEST PRACTICES](06_BEST_PRACTICES.md) +- 🀝 And above all please respect our [CODE OF CONDUCT](09_CODE_OF_CONDUCT.md) + + +### Communication +If you're feeling social we'd love to chat. Pick your poison(s): +- [Slack](https://singer-slackin.herokuapp.com/) +- [Twitter](https://twitter.com/singer_io) +- [Our Public Roadmap on Trello](https://trello.com/b/BMNRnIoU/singer-roadmap) +- Feel free to create an issue on any repo's for specific questions +- 🐦 Carrier pigeon (beta) + + +--- + +Copyright © 2017 Stitch diff --git a/02_EXTRACT_WITH_TAPS.md b/02_EXTRACT_WITH_TAPS.md new file mode 100644 index 0000000..3373923 --- /dev/null +++ b/02_EXTRACT_WITH_TAPS.md @@ -0,0 +1,108 @@ +# 🍺 All about TAPS 🍺 + +## Taps extract data from any source and write that data to a standard stream in a JSON-based format. + +Be Check out our [official](05_MAKE_IT_OFFICIAL.md) and [unofficial](04_COOL_UNOFFICIAL_CLUB.md) pages before creating your own since it might save you some time in the long run. + +### Making Taps + +If a tap for your use case doesn't exist yet have no fear! This documentation will help. Let's get started: + +### πŸ‘©πŸ½β€πŸ’» πŸ‘¨πŸ»β€πŸ’» Hello, world + +A Tap is just a program, written in any language, that outputs data to `stdout` according to the [Singer spec](07_SPEC.md). + +In fact, your first Tap can be written from the command line, without any programming at all: + +```bash +β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' +``` + +This writes the datapoint `{"value":"world"}` to the *hello* stream along with a schema indicating that `value` is a string. + +That data can be piped into any Target, like the [Google Sheets Target], over `stdin`: + +```bash +β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' | target-gsheet -c config.json +``` + +### 🐍🐍🐍 A Python Tap + +To move beyond *Hello, world* you'll need a real programming language. Although any language will do, we have built a Python library to help you get up and running quickly. This is because Python is the defacto standard for data engineers or folks interested in moving data like yourself. + +If you need help ramping up or getting started with Python there's fantastic community support [here](https://www.python.org/about/gettingstarted/). + +Let's write a Tap called `tap_ip.py` that retrieves the current IP using icanhazip.com, and writes that data with a timestamp. + +First, install the Singer helper library with `pip`: + +```bash +β€Ί pip install singer-python +``` + +Then, open up a new file called `tap_ip.py` in your favorite editor. + +```python +import singer +import urllib.request +from datetime import datetime, timezone +``` + +We'll use the `datetime` module to get the current timestamp, the +`singer` module to write data to `stdout` in the correct format, and +the `urllib.request` module to make a request to icanhazip.com. + +```python +now = datetime.now(timezone.utc).isoformat() +schema = { + 'properties': { + 'ip': {'type': 'string'}, + 'timestamp': {'type': 'string', 'format': 'date-time'}, + }, +} + +``` + +This sets up some of the data we'll need - the current time, and the +schema of the data we'll be writing to the stream formatted as a [JSON +Schema]. + +```python +with urllib.request.urlopen('http://icanhazip.com') as response: + ip = response.read().decode('utf-8').strip() + singer.write_schema('my_ip', schema, 'timestamp') + singer.write_records('my_ip', [{'timestamp': now, 'ip': ip}]) +``` + +Finally, we make the HTTP request, parse the response, and then make +two calls to the `singer` library: + + - `singer.write_schema` which writes the schema of the `my_ip` stream and defines its primary key + - `singer.write_records` to write a record to that stream + +We can send this data to Google Sheets as an example by running our new Tap +with the Google Sheets Target: + +```bash +β€Ί python tap_ip.py | target-gsheet -c config.json +``` + +Alternatively you could send it to a csv just as easy by doing this: + +```bash +β€Ί python tap_ip.py | target-csv -c config.json +``` + +## To summarize the formula for pulling with a tap and sending to a target is: + +```bash +β€Ί python YOUR_TAP_FILE.py -c TAP_CONFIG_FILE_HERE.json | TARGET-NAME -c TARGET_CONFIG_FILE_HERE.json +``` + +You might not always need config files, in which case it would just be: + +```bash +β€Ί python YOUR_TAP_FILE.py | TARGET-NAME +``` + +This assumes your target is intalled locally. Which you can read more about by heading over to the [targets page](03_SEND_TO_TARGETS) diff --git a/03_SEND_TO_TARGETS.md b/03_SEND_TO_TARGETS.md new file mode 100644 index 0000000..fc128ab --- /dev/null +++ b/03_SEND_TO_TARGETS.md @@ -0,0 +1,48 @@ +# 🎯 All about TARGETS 🎯 + +## Targets are very similar to TAPS in that they still adhere to the Singer spec. + +Right now there are targets made to send to a csv, Google Sheets, Magento, or Stitch but the possibilities are endless. To send your tap to a target here is an example using Google Sheets: + + +```bash +β€Ί tap-ip | target-gsheet -c config.json +``` + +Alternatively you could send it to a csv just as easy by doing this: + +```bash +β€Ί tap-ip | target-csv -c config.json +``` + +To summarize the formula for pulling with a tap and sending to a target is: + +```bash +β€Ί TAP-NAME -c TAP_CONFIG_FILE_HERE.json | TARGET-NAME -c TARGET_CONFIG_FILE_HERE.json +``` + +You might not always need config files, in which case it would just be: + +```bash +β€Ί TAP-NAME | TARGET-NAME +``` +See? Easy. + +## If you'd like to create your own TARGET just follow the same tutorial for making a tap + +just like building a tap. +you consume the messages that a tap outputs and determine what to do with it / where to send it + +prior to it being package up, in your dev environment you would run it like this +```bash +β€Ί TAP-NAME -c TAP_CONFIG_FILE_HERE.json | python my_target.py -c TARGET_CONFIG_FILE_HERE.json +``` + +once both bundled as packages then you can install via pip your other fav package system & then its this; +ou might not always need config files, in which case it would just be: + +```bash +β€Ί TAP-NAME | TARGET-NAME +``` + + diff --git a/04_COOL_TAPS_CLUB.md b/04_COOL_TAPS_CLUB.md new file mode 100644 index 0000000..04fd289 --- /dev/null +++ b/04_COOL_TAPS_CLUB.md @@ -0,0 +1,67 @@ +## THE OFFICIAL (AND UNOFFICIAL) COLLECTION OF TAPS + +### Defining Official and Unofficial + +To make a tap or target official the Singer team has reviewed the work and it conforms ot eh best practices. official means stamp of review & approval and put that repo into the singer github organizaiton. + + + +Singer supports official Taps through [Stitch](https://stitchdata.com). Stitch is software that moves all your data into your data warehouse and keeps everything nice and tidy. Official taps are maintained and integrated by the team at Stitch. This is why not all unofficial taps become official. Sometimes the team is building something else and other times the integration isn't high on the roadmap priorities. + +Regardless this doesn't mean that you can't move all the data you want. Which is why unofficial taps are so important and why we value them so much! If you've created a tap or target be sure to simply submit a pull request to our unofficial table and show the world your work. Also be sure to drop us a line so we can send you *EPIC SWAG* 🎁 + +Without further ado here are the unofficial taps: + + +collection from around the internet + +| TYPE | NAME + REPO | USER πŸ‘¨πŸ½β€πŸ’» πŸ‘©πŸ»β€πŸ’» πŸ‘‘ | +| -------- |----------------------------------------------------------------------------|------------------------------------------------------------------| +| tap | [tap-csv](https://github.com/robertjmoore/tap-csv) | [robertjmoore](https://github.com/robertjmoore/) | +| tap | [tap-clubhouse](https://github.com/envoy/tap-clubhouse) | [envoy](https://github.com/envoy) | +| tap | [singer-airtable](https://github.com/StantonVentures/singer-airtable) | [StantonVentures](https://github.com/stantonventures) | +| tap | [tap-s3-csv](https://github.com/fishtown-analytics/tap-s3-csv) | [fishtown-analytics](https://github.com/fishtown-analytics) | +| tap | [tap-jsonfeed](https://github.com/briansloane/tap-jsonfeed) | [briansloane](https://github.com/briansloane/) | +| tap | [tap-reviewscouk](https://github.com/onedox/tap-reviewscouk) | [ondex](https://github.com/onedox) | +| tap | [tap-fake-users](https://github.com/bengarvey/tap-fake-users) | [bengarvey](https://github.com/bengarvey) | +| tap | [tap-awin](https://github.com/onedox/tap-awin) | [onedox](https://github.com/onedox) | +| tap | [marvel-tap](https://github.com/ashhath/marvel-tap) | [ashhath](https://github.com/ashhath) | +| tap | [tap-mixpanel](https://github.com/Kierchon/tap-mixpanel) | [kierchon](https://github.com/kierchon) | +| tap | [tap-appsflyer](https://github.com/ezcater/tap-appsflyer) | [ezcater](https://github.com/ezcater) | +| tap | [tap-fullstory](https://github.com/expectedbehavior/tap-fullstory) | [expectedbehavior](https://github.com/expectedbehavior) | +| tap | [stitch-stream-deputy](https://github.com/DeputyApp/stitch-stream-deputy) | [deputyapp](https://github.com/deputyapp) | + +And then just in case here's a tidy list of the official ones supported by Stitch: + +| TYPE | NAME + REPO | CONTRIBUTOR | +| -------- |-----------------------------------------------------------------------------|---------------------------------------------------------| +| tap | [Hubspot](https://github.com/singer-io/tap-hubspot) | stitch +| tap | [Marketo](https://github.com/singer-io/tap-marketo) | stitch +| tap | [Shippo](https://github.com/singer-io/tap-shippo) | bob +| tap | [GitHub](https://github.com/singer-io/tap-github) | stitch +| tap | [Close.io](https://github.com/singer-io/tap-closeio) | stitch +| tap | [Referral SaaSquatch](https://github.com/singer-io/tap-referral-saasquatch) | stitch +| tap | [Freshdesk](https://github.com/singer-io/tap-freshdesk) | stitch +| tap | [Braintree](https://github.com/singer-io/tap-braintree) | stitch +| tap | [GitLab](https://github.com/singer-io/tap-gitlab) | stitch +| tap | [Wootric](https://github.com/singer-io/tap-wootric) | stitch +| tap | [Fixer.io](https://github.com/singer-io/tap-fixerio) | stitch +| tap | [Outbrain](https://github.com/singer-io/tap-outbrain) | FISHTOWN +| tap | [Harvest](https://github.com/singer-io/tap-harvest) | facet/ryan +| tap | [Taboola](https://github.com/singer-io/tap-taboola) | fishtown +| tap | [Urban Airship](https://github.com/envoy/tap-clubhouse) | stitch +| tap | [Facebook](https://github.com/singer-io/tap-facebook) | stitch +| tap | [Google AdWords](https://github.com/singer-io/tap-adwords) | stitch +| tap | [Fullstory](https://github.com/singer-io/tap-fullstory) | [expectedbehavior](https://github.com/expectedbehavior) | +| target | [Stitch](https://github.com/singer-io/target-stitch) | stitch +| target | [CSV](https://github.com/singer-io/target-csv) | stitch +| target | [Google Sheets](https://github.com/singer-io/target-gsheet) | stitch +| target | [Magento BI](https://github.com/robertjmoore/target-magentobi) | robert + + + + + + + + diff --git a/05_CONTRIBUTIONS.md b/05_CONTRIBUTIONS.md new file mode 100644 index 0000000..484252b --- /dev/null +++ b/05_CONTRIBUTIONS.md @@ -0,0 +1,502 @@ +# Contributing to SINGER + +:+1::tada: First off, thanks for taking the time to contribute! :tada::+1: + +The following is a set of guidelines for contributing to the Singer project. These are mostly guidelines, not rules. Use your best judgment, and feel free to propose changes to this document in a pull request. + +#### Table Of Contents + +[Code of Conduct](#code-of-conduct) + +[I don't want to read this whole thing, I just have a question!!!](#i-dont-want-to-read-this-whole-thing-i-just-have-a-question) + +[What should I know before I get started?](#what-should-i-know-before-i-get-started) + * [Atom and Packages](#atom-and-packages) + * [Atom Design Decisions](#design-decisions) + +[How Can I Contribute?](#how-can-i-contribute) + * [Reporting Bugs](#reporting-bugs) + * [Suggesting Enhancements](#suggesting-enhancements) + * [Your First Code Contribution](#your-first-code-contribution) + * [Pull Requests](#pull-requests) + +[Styleguides](#styleguides) + * [Git Commit Messages](#git-commit-messages) + * [JavaScript Styleguide](#javascript-styleguide) + * [CoffeeScript Styleguide](#coffeescript-styleguide) + * [Specs Styleguide](#specs-styleguide) + * [Documentation Styleguide](#documentation-styleguide) + +[Additional Notes](#additional-notes) + * [Issue and Pull Request Labels](#issue-and-pull-request-labels) + +## Code of Conduct + +This project and everyone participating in it is governed by the [Atom Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code. Please report unacceptable behavior to [atom@github.com](mailto:atom@github.com). + +## I don't want to read this whole thing I just have a question!!! + +> **Note:** [Please don't file an issue to ask a question.](http://blog.atom.io/2016/04/19/managing-the-deluge-of-atom-issues.html) You'll get faster results by using the resources below. + +We have an official message board with a detailed FAQ and where the community chimes in with helpful advice if you have questions. + +* [Discuss, the official Atom and Electron message board](https://discuss.atom.io) +* [Atom FAQ](https://discuss.atom.io/c/faq) + +If chat is more your speed, you can join the Atom and Electron Slack team: + +* [Join the Atom and Electron Slack Team](http://atom-slack.herokuapp.com/) + * Even though Slack is a chat service, sometimes it takes several hours for community members to respond — please be patient! + * Use the `#atom` channel for general questions or discussion about Atom + * Use the `#electron` channel for questions about Electron + * Use the `#packages` channel for questions or discussion about writing or contributing to Atom packages (both core and community) + * Use the `#ui` channel for questions and discussion about Atom UI and themes + * There are many other channels available, check the channel list + +## What should I know before I get started? + +### Atom and Packages + +Atom is a large open source project — it's made up of over [200 repositories](https://github.com/atom). When you initially consider contributing to Atom, you might be unsure about which of those 200 repositories implements the functionality you want to change or report a bug for. This section should help you with that. + +Atom is intentionally very modular. Nearly every non-editor UI element you interact with comes from a package, even fundamental things like tabs and the status-bar. These packages are packages in the same way that packages in the [Atom package repository](https://atom.io/packages) are packages, with one difference: they are bundled into the [default distribution](https://github.com/atom/atom/blob/10b8de6fc499a7def9b072739486e68530d67ab4/package.json#L58). + + + +![atom-packages](https://cloud.githubusercontent.com/assets/69169/10472281/84fc9792-71d3-11e5-9fd1-19da717df079.png) + +To get a sense for the packages that are bundled with Atom, you can go to Settings > Packages within Atom and take a look at the Core Packages section. + +Here's a list of the big ones: + +* [atom/atom](https://github.com/atom/atom) - Atom Core! The core editor component is responsible for basic text editing (e.g. cursors, selections, scrolling), text indentation, wrapping, and folding, text rendering, editor rendering, file system operations (e.g. saving), and installation and auto-updating. You should also use this repository for feedback related to the [Atom API](https://atom.io/docs/api/latest) and for large, overarching design proposals. +* [tree-view](https://github.com/atom/tree-view) - file and directory listing on the left of the UI. +* [fuzzy-finder](https://github.com/atom/fuzzy-finder) - the quick file opener. +* [find-and-replace](https://github.com/atom/find-and-replace) - all search and replace functionality. +* [tabs](https://github.com/atom/tabs) - the tabs for open editors at the top of the UI. +* [status-bar](https://github.com/atom/status-bar) - the status bar at the bottom of the UI. +* [markdown-preview](https://github.com/atom/markdown-preview) - the rendered markdown pane item. +* [settings-view](https://github.com/atom/settings-view) - the settings UI pane item. +* [autocomplete-plus](https://github.com/atom/autocomplete-plus) - autocompletions shown while typing. Some languages have additional packages for autocompletion functionality, such as [autocomplete-html](https://github.com/atom/autocomplete-html). +* [git-diff](https://github.com/atom/git-diff) - Git change indicators shown in the editor's gutter. +* [language-javascript](https://github.com/atom/language-javascript) - all bundled languages are packages too, and each one has a separate package `language-[name]`. Use these for feedback on syntax highlighting issues that only appear for a specific language. +* [one-dark-ui](https://github.com/atom/one-dark-ui) - the default UI styling for anything but the text editor. UI theme packages (i.e. packages with a `-ui` suffix) provide only styling and it's possible that a bundled package is responsible for a UI issue. There are other other bundled UI themes, such as [one-light-ui](https://github.com/atom/one-light-ui). +* [one-dark-syntax](https://github.com/atom/one-dark-syntax) - the default syntax highlighting styles applied for all languages. There are other other bundled syntax themes, such as [solarized-dark-syntax](https://github.com/atom/solarized-dark-syntax). You should use these packages for reporting issues that appear in many languages, but disappear if you change to another syntax theme. +* [apm](https://github.com/atom/apm) - the `apm` command line tool (Atom Package Manager). You should use this repository for any contributions related to the `apm` tool and to publishing packages. +* [atom.io](https://github.com/atom/atom.io) - the repository for feedback on the [Atom.io website](https://atom.io) and the [Atom.io package API](https://github.com/atom/atom/blob/master/docs/apm-rest-api.md) used by [apm](https://github.com/atom/apm). + +There are many more, but this list should be a good starting point. For more information on how to work with Atom's official packages, see [Contributing to Atom Packages](http://flight-manual.atom.io/hacking-atom/sections/contributing-to-official-atom-packages/). + +Also, because Atom is so extensible, it's possible that a feature you've become accustomed to in Atom or an issue you're encountering isn't coming from a bundled package at all, but rather a [community package](https://atom.io/packages) you've installed. Each community package has its own repository too, the [Atom FAQ](https://discuss.atom.io/c/faq) has instructions on how to [contact the maintainers of any Atom community package or theme.](https://discuss.atom.io/t/i-have-a-question-about-a-specific-atom-community-package-where-is-the-best-place-to-ask-it/25581) + +#### Package Conventions + +There are a few conventions that have developed over time around packages: + +* Packages that add one or more syntax highlighting grammars are named `language-[language-name]` + * Language packages can add other things besides just a grammar. Many offer commonly-used snippets. Try not to add too much though. +* Theme packages are split into two categories: UI and Syntax themes + * UI themes are named `[theme-name]-ui` + * Syntax themes are named `[theme-name]-syntax` + * Often themes that are designed to work together are given the same root name, for example: `one-dark-ui` and `one-dark-syntax` + * UI themes style everything outside of the editor pane — all of the green areas in the [packages image above](#atom-packages-image) + * Syntax themes style just the items inside the editor pane, mostly syntax highlighting +* Packages that add [autocomplete providers](https://github.com/atom/autocomplete-plus/wiki/Autocomplete-Providers) are named `autocomplete-[what-they-autocomplete]` — ex: [autocomplete-css](https://github.com/atom/autocomplete-css) + +### Design Decisions + +When we make a significant decision in how we maintain the project and what we can or cannot support, we will document it in the [atom/design-decisions repository](https://github.com/atom/design-decisions). If you have a question around how we do things, check to see if it is documented there. If it is *not* documented there, please open a new topic on [Discuss, the official Atom message board](https://discuss.atom.io) and ask your question. + +## How Can I Contribute? + +### Reporting Bugs + +This section guides you through submitting a bug report for Atom. Following these guidelines helps maintainers and the community understand your report :pencil:, reproduce the behavior :computer: :computer:, and find related reports :mag_right:. + +Before creating bug reports, please check [this list](#before-submitting-a-bug-report) as you might find out that you don't need to create one. When you are creating a bug report, please [include as many details as possible](#how-do-i-submit-a-good-bug-report). Fill out [the required template](ISSUE_TEMPLATE.md), the information it asks for helps us resolve issues faster. + +> **Note:** If you find a **Closed** issue that seems like it is the same thing that you're experiencing, open a new issue and include a link to the original issue in the body of your new one. + +#### Before Submitting A Bug Report + +* **Check the [debugging guide](http://flight-manual.atom.io/hacking-atom/sections/debugging/).** You might be able to find the cause of the problem and fix things yourself. Most importantly, check if you can reproduce the problem [in the latest version of Atom](http://flight-manual.atom.io/hacking-atom/sections/debugging/#update-to-the-latest-version), if the problem happens when you run Atom in [safe mode](http://flight-manual.atom.io/hacking-atom/sections/debugging/#check-if-the-problem-shows-up-in-safe-mode), and if you can get the desired behavior by changing [Atom's or packages' config settings](http://flight-manual.atom.io/hacking-atom/sections/debugging/#check-atom-and-package-settings). +* **Check the [FAQs on the forum](https://discuss.atom.io/c/faq)** for a list of common questions and problems. +* **Determine [which repository the problem should be reported in](#atom-and-packages)**. +* **Perform a [cursory search](https://github.com/issues?q=+is%3Aissue+user%3Aatom)** to see if the problem has already been reported. If it has **and the issue is still open**, add a comment to the existing issue instead of opening a new one. + +#### How Do I Submit A (Good) Bug Report? + +Bugs are tracked as [GitHub issues](https://guides.github.com/features/issues/). After you've determined [which repository](#atom-and-packages) your bug is related to, create an issue on that repository and provide the following information by filling in [the template](ISSUE_TEMPLATE.md). + +Explain the problem and include additional details to help maintainers reproduce the problem: + +* **Use a clear and descriptive title** for the issue to identify the problem. +* **Describe the exact steps which reproduce the problem** in as many details as possible. For example, start by explaining how you started Atom, e.g. which command exactly you used in the terminal, or how you started Atom otherwise. When listing steps, **don't just say what you did, but explain how you did it**. For example, if you moved the cursor to the end of a line, explain if you used the mouse, or a keyboard shortcut or an Atom command, and if so which one? +* **Provide specific examples to demonstrate the steps**. Include links to files or GitHub projects, or copy/pasteable snippets, which you use in those examples. If you're providing snippets in the issue, use [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines). +* **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. +* **Explain which behavior you expected to see instead and why.** +* **Include screenshots and animated GIFs** which show you following the described steps and clearly demonstrate the problem. If you use the keyboard while following the steps, **record the GIF with the [Keybinding Resolver](https://github.com/atom/keybinding-resolver) shown**. You can use [this tool](http://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [this tool](https://github.com/colinkeenan/silentcast) or [this tool](https://github.com/GNOME/byzanz) on Linux. +* **If you're reporting that Atom crashed**, include a crash report with a stack trace from the operating system. On macOS, the crash report will be available in `Console.app` under "Diagnostic and usage information" > "User diagnostic reports". Include the crash report in the issue in a [code block](https://help.github.com/articles/markdown-basics/#multiple-lines), a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or put it in a [gist](https://gist.github.com/) and provide link to that gist. +* **If the problem is related to performance or memory**, include a [CPU profile capture](http://flight-manual.atom.io/hacking-atom/sections/debugging/#diagnose-runtime-performance) with your report. +* **If Chrome's developer tools pane is shown without you triggering it**, that normally means that you have a syntax error in one of your themes or in your `styles.less`. Try running in [Safe Mode](http://flight-manual.atom.io/hacking-atom/sections/debugging/#using-safe-mode) and using a different theme or comment out the contents of your `styles.less` to see if that fixes the problem. +* **If the problem wasn't triggered by a specific action**, describe what you were doing before the problem happened and share more information using the guidelines below. + +Provide more context by answering these questions: + +* **Can you reproduce the problem in [safe mode](http://flight-manual.atom.io/hacking-atom/sections/debugging/#diagnose-runtime-performance-problems-with-the-dev-tools-cpu-profiler)?** +* **Did the problem start happening recently** (e.g. after updating to a new version of Atom) or was this always a problem? +* If the problem started happening recently, **can you reproduce the problem in an older version of Atom?** What's the most recent version in which the problem doesn't happen? You can download older versions of Atom from [the releases page](https://github.com/atom/atom/releases). +* **Can you reliably reproduce the issue?** If not, provide details about how often the problem happens and under which conditions it normally happens. +* If the problem is related to working with files (e.g. opening and editing files), **does the problem happen for all files and projects or only some?** Does the problem happen only when working with local or remote files (e.g. on network drives), with files of a specific type (e.g. only JavaScript or Python files), with large files or files with very long lines, or with files in a specific encoding? Is there anything else special about the files you are using? + +Include details about your configuration and environment: + +* **Which version of Atom are you using?** You can get the exact version by running `atom -v` in your terminal, or by starting Atom and running the `Application: About` command from the [Command Palette](https://github.com/atom/command-palette). +* **What's the name and version of the OS you're using**? +* **Are you running Atom in a virtual machine?** If so, which VM software are you using and which operating systems and versions are used for the host and the guest? +* **Which [packages](#atom-and-packages) do you have installed?** You can get that list by running `apm list --installed`. +* **Are you using [local configuration files](http://flight-manual.atom.io/using-atom/sections/basic-customization/)** `config.cson`, `keymap.cson`, `snippets.cson`, `styles.less` and `init.coffee` to customize Atom? If so, provide the contents of those files, preferably in a [code block](https://help.github.com/articles/markdown-basics/#multiple-lines) or with a link to a [gist](https://gist.github.com/). +* **Are you using Atom with multiple monitors?** If so, can you reproduce the problem when you use a single monitor? +* **Which keyboard layout are you using?** Are you using a US layout or some other layout? + +### Suggesting Enhancements + +This section guides you through submitting an enhancement suggestion for Atom, including completely new features and minor improvements to existing functionality. Following these guidelines helps maintainers and the community understand your suggestion :pencil: and find related suggestions :mag_right:. + +Before creating enhancement suggestions, please check [this list](#before-submitting-an-enhancement-suggestion) as you might find out that you don't need to create one. When you are creating an enhancement suggestion, please [include as many details as possible](#how-do-i-submit-a-good-enhancement-suggestion). Fill in [the template](ISSUE_TEMPLATE.md), including the steps that you imagine you would take if the feature you're requesting existed. + +#### Before Submitting An Enhancement Suggestion + +* **Check the [debugging guide](http://flight-manual.atom.io/hacking-atom/sections/debugging/)** for tips β€” you might discover that the enhancement is already available. Most importantly, check if you're using [the latest version of Atom](http://flight-manual.atom.io/hacking-atom/sections/debugging/#update-to-the-latest-version) and if you can get the desired behavior by changing [Atom's or packages' config settings](http://flight-manual.atom.io/hacking-atom/sections/debugging/#check-atom-and-package-settings). +* **Check if there's already [a package](https://atom.io/packages) which provides that enhancement.** +* **Determine [which repository the enhancement should be suggested in](#atom-and-packages).** +* **Perform a [cursory search](https://github.com/issues?q=+is%3Aissue+user%3Aatom)** to see if the enhancement has already been suggested. If it has, add a comment to the existing issue instead of opening a new one. + +#### How Do I Submit A (Good) Enhancement Suggestion? + +Enhancement suggestions are tracked as [GitHub issues](https://guides.github.com/features/issues/). After you've determined [which repository](#atom-and-packages) your enhancement suggestion is related to, create an issue on that repository and provide the following information: + +* **Use a clear and descriptive title** for the issue to identify the suggestion. +* **Provide a step-by-step description of the suggested enhancement** in as many details as possible. +* **Provide specific examples to demonstrate the steps**. Include copy/pasteable snippets which you use in those examples, as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines). +* **Describe the current behavior** and **explain which behavior you expected to see instead** and why. +* **Include screenshots and animated GIFs** which help you demonstrate the steps or point out the part of Atom which the suggestion is related to. You can use [this tool](http://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [this tool](https://github.com/colinkeenan/silentcast) or [this tool](https://github.com/GNOME/byzanz) on Linux. +* **Explain why this enhancement would be useful** to most Atom users and isn't something that can or should be implemented as a [community package](#atom-and-packages). +* **List some other text editors or applications where this enhancement exists.** +* **Specify which version of Atom you're using.** You can get the exact version by running `atom -v` in your terminal, or by starting Atom and running the `Application: About` command from the [Command Palette](https://github.com/atom/command-palette). +* **Specify the name and version of the OS you're using.** + +### Your First Code Contribution + +Unsure where to begin contributing to Atom? You can start by looking through these `beginner` and `help-wanted` issues: + +* [Beginner issues][beginner] - issues which should only require a few lines of code, and a test or two. +* [Help wanted issues][help-wanted] - issues which should be a bit more involved than `beginner` issues. + +Both issue lists are sorted by total number of comments. While not perfect, number of comments is a reasonable proxy for impact a given change will have. + +If you want to read about using Atom or developing packages in Atom, the [Atom Flight Manual](http://flight-manual.atom.io) is free and available online. You can find the source to the manual in [atom/flight-manual.atom.io](https://github.com/atom/flight-manual.atom.io). + +#### Local development + +All packages can be developed locally, by checking out the corresponding repository and registering the package to Atom with `apm`: + +``` +$ git clone url-to-git-repository +$ cd path-to-package/ +$ apm link -d +$ atom -d . +``` + +By running Atom with the `-d` flag, you signal it to run with development packages installed. `apm link` makes sure that your local repository is loaded by Atom. + +### Pull Requests + +* Fill in [the required template](PULL_REQUEST_TEMPLATE.md) +* Do not include issue numbers in the PR title +* Include screenshots and animated GIFs in your pull request whenever possible. +* Follow the [JavaScript](#javascript-styleguide) and [CoffeeScript](#coffeescript-styleguide) styleguides. +* Include thoughtfully-worded, well-structured [Jasmine](http://jasmine.github.io/) specs in the `./spec` folder. Run them using `atom --test spec`. See the [Specs Styleguide](#specs-styleguide) below. +* Document new code based on the [Documentation Styleguide](#documentation-styleguide) +* End all files with a newline +* [Avoid platform-dependent code](http://flight-manual.atom.io/hacking-atom/sections/cross-platform-compatibility/) +* Place requires in the following order: + * Built in Node Modules (such as `path`) + * Built in Atom and Electron Modules (such as `atom`, `remote`) + * Local Modules (using relative paths) +* Place class properties in the following order: + * Class methods and properties (methods starting with a `@` in CoffeeScript or `static` in JavaScript) + * Instance methods and properties + +## Styleguides + +### Git Commit Messages + +* Use the present tense ("Add feature" not "Added feature") +* Use the imperative mood ("Move cursor to..." not "Moves cursor to...") +* Limit the first line to 72 characters or less +* Reference issues and pull requests liberally after the first line +* When only changing documentation, include `[ci skip]` in the commit description +* Consider starting the commit message with an applicable emoji: + * :art: `:art:` when improving the format/structure of the code + * :racehorse: `:racehorse:` when improving performance + * :non-potable_water: `:non-potable_water:` when plugging memory leaks + * :memo: `:memo:` when writing docs + * :penguin: `:penguin:` when fixing something on Linux + * :apple: `:apple:` when fixing something on macOS + * :checkered_flag: `:checkered_flag:` when fixing something on Windows + * :bug: `:bug:` when fixing a bug + * :fire: `:fire:` when removing code or files + * :green_heart: `:green_heart:` when fixing the CI build + * :white_check_mark: `:white_check_mark:` when adding tests + * :lock: `:lock:` when dealing with security + * :arrow_up: `:arrow_up:` when upgrading dependencies + * :arrow_down: `:arrow_down:` when downgrading dependencies + * :shirt: `:shirt:` when removing linter warnings + +### JavaScript Styleguide + +All JavaScript must adhere to [JavaScript Standard Style](http://standardjs.com/). + +* Prefer the object spread operator (`{...anotherObj}`) to `Object.assign()` +* Inline `export`s with expressions whenever possible + ```js + // Use this: + export default class ClassName { + + } + + // Instead of: + class ClassName { + + } + export default ClassName + ``` + +### CoffeeScript Styleguide + +* Set parameter defaults without spaces around the equal sign + * `clear = (count=1) ->` instead of `clear = (count = 1) ->` +* Use spaces around operators + * `count + 1` instead of `count+1` +* Use spaces after commas (unless separated by newlines) +* Use parentheses if it improves code clarity. +* Prefer alphabetic keywords to symbolic keywords: + * `a is b` instead of `a == b` +* Avoid spaces inside the curly-braces of hash literals: + * `{a: 1, b: 2}` instead of `{ a: 1, b: 2 }` +* Include a single line of whitespace between methods. +* Capitalize initialisms and acronyms in names, except for the first word, which + should be lower-case: + * `getURI` instead of `getUri` + * `uriToOpen` instead of `URIToOpen` +* Use `slice()` to copy an array +* Add an explicit `return` when your function ends with a `for`/`while` loop and + you don't want it to return a collected array. +* Use `this` instead of a standalone `@` + * `return this` instead of `return @` + +### Specs Styleguide + +- Include thoughtfully-worded, well-structured [Jasmine](http://jasmine.github.io/) specs in the `./spec` folder. +- Treat `describe` as a noun or situation. +- Treat `it` as a statement about state or how an operation changes state. + +#### Example + +```coffee +describe 'a dog', -> + it 'barks', -> + # spec here + describe 'when the dog is happy', -> + it 'wags its tail', -> + # spec here +``` + +### Documentation Styleguide + +* Use [AtomDoc](https://github.com/atom/atomdoc). +* Use [Markdown](https://daringfireball.net/projects/markdown). +* Reference methods and classes in markdown with the custom `{}` notation: + * Reference classes with `{ClassName}` + * Reference instance methods with `{ClassName::methodName}` + * Reference class methods with `{ClassName.methodName}` + +#### Example + +```coffee +# Public: Disable the package with the given name. +# +# * `name` The {String} name of the package to disable. +# * `options` (optional) The {Object} with disable options (default: {}): +# * `trackTime` A {Boolean}, `true` to track the amount of time taken. +# * `ignoreErrors` A {Boolean}, `true` to catch and ignore errors thrown. +# * `callback` The {Function} to call after the package has been disabled. +# +# Returns `undefined`. +disablePackage: (name, options, callback) -> +``` + +## Additional Notes + +### Issue and Pull Request Labels + +This section lists the labels we use to help us track and manage issues and pull requests. Most labels are used across all Atom repositories, but some are specific to `atom/atom`. + +[GitHub search](https://help.github.com/articles/searching-issues/) makes it easy to use labels for finding groups of issues or pull requests you're interested in. For example, you might be interested in [open issues across `atom/atom` and all Atom-owned packages which are labeled as bugs, but still need to be reliably reproduced](https://github.com/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Abug+label%3Aneeds-reproduction) or perhaps [open pull requests in `atom/atom` which haven't been reviewed yet](https://github.com/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+repo%3Aatom%2Fatom+comments%3A0). To help you find issues and pull requests, each label is listed with search links for finding open items with that label in `atom/atom` only and also across all Atom repositories. We encourage you to read about [other search filters](https://help.github.com/articles/searching-issues/) which will help you write more focused queries. + +The labels are loosely grouped by their purpose, but it's not required that every issue have a label from every group or that an issue can't have more than one label from the same group. + +Please open an issue on `atom/atom` if you have suggestions for new labels, and if you notice some labels are missing on some repositories, then please open an issue on that repository. + +#### Type of Issue and Issue State + +| Label name | `atom/atom` :mag_right: | `atom`‑org :mag_right: | Description | +| --- | --- | --- | --- | +| `enhancement` | [search][search-atom-repo-label-enhancement] | [search][search-atom-org-label-enhancement] | Feature requests. | +| `bug` | [search][search-atom-repo-label-bug] | [search][search-atom-org-label-bug] | Confirmed bugs or reports that are very likely to be bugs. | +| `question` | [search][search-atom-repo-label-question] | [search][search-atom-org-label-question] | Questions more than bug reports or feature requests (e.g. how do I do X). | +| `feedback` | [search][search-atom-repo-label-feedback] | [search][search-atom-org-label-feedback] | General feedback more than bug reports or feature requests. | +| `help-wanted` | [search][search-atom-repo-label-help-wanted] | [search][search-atom-org-label-help-wanted] | The Atom core team would appreciate help from the community in resolving these issues. | +| `beginner` | [search][search-atom-repo-label-beginner] | [search][search-atom-org-label-beginner] | Less complex issues which would be good first issues to work on for users who want to contribute to Atom. | +| `more-information-needed` | [search][search-atom-repo-label-more-information-needed] | [search][search-atom-org-label-more-information-needed] | More information needs to be collected about these problems or feature requests (e.g. steps to reproduce). | +| `needs-reproduction` | [search][search-atom-repo-label-needs-reproduction] | [search][search-atom-org-label-needs-reproduction] | Likely bugs, but haven't been reliably reproduced. | +| `blocked` | [search][search-atom-repo-label-blocked] | [search][search-atom-org-label-blocked] | Issues blocked on other issues. | +| `duplicate` | [search][search-atom-repo-label-duplicate] | [search][search-atom-org-label-duplicate] | Issues which are duplicates of other issues, i.e. they have been reported before. | +| `wontfix` | [search][search-atom-repo-label-wontfix] | [search][search-atom-org-label-wontfix] | The Atom core team has decided not to fix these issues for now, either because they're working as intended or for some other reason. | +| `invalid` | [search][search-atom-repo-label-invalid] | [search][search-atom-org-label-invalid] | Issues which aren't valid (e.g. user errors). | +| `package-idea` | [search][search-atom-repo-label-package-idea] | [search][search-atom-org-label-package-idea] | Feature request which might be good candidates for new packages, instead of extending Atom or core Atom packages. | +| `wrong-repo` | [search][search-atom-repo-label-wrong-repo] | [search][search-atom-org-label-wrong-repo] | Issues reported on the wrong repository (e.g. a bug related to the [Settings View package](https://github.com/atom/settings-view) was reported on [Atom core](https://github.com/atom/atom)). | + +#### Topic Categories + +| Label name | `atom/atom` :mag_right: | `atom`‑org :mag_right: | Description | +| --- | --- | --- | --- | +| `windows` | [search][search-atom-repo-label-windows] | [search][search-atom-org-label-windows] | Related to Atom running on Windows. | +| `linux` | [search][search-atom-repo-label-linux] | [search][search-atom-org-label-linux] | Related to Atom running on Linux. | +| `mac` | [search][search-atom-repo-label-mac] | [search][search-atom-org-label-mac] | Related to Atom running on macOS. | +| `documentation` | [search][search-atom-repo-label-documentation] | [search][search-atom-org-label-documentation] | Related to any type of documentation (e.g. [API documentation](https://atom.io/docs/api/latest/) and the [flight manual](http://flight-manual.atom.io/)). | +| `performance` | [search][search-atom-repo-label-performance] | [search][search-atom-org-label-performance] | Related to performance. | +| `security` | [search][search-atom-repo-label-security] | [search][search-atom-org-label-security] | Related to security. | +| `ui` | [search][search-atom-repo-label-ui] | [search][search-atom-org-label-ui] | Related to visual design. | +| `api` | [search][search-atom-repo-label-api] | [search][search-atom-org-label-api] | Related to Atom's public APIs. | +| `uncaught-exception` | [search][search-atom-repo-label-uncaught-exception] | [search][search-atom-org-label-uncaught-exception] | Issues about uncaught exceptions, normally created from the [Notifications package](https://github.com/atom/notifications). | +| `crash` | [search][search-atom-repo-label-crash] | [search][search-atom-org-label-crash] | Reports of Atom completely crashing. | +| `auto-indent` | [search][search-atom-repo-label-auto-indent] | [search][search-atom-org-label-auto-indent] | Related to auto-indenting text. | +| `encoding` | [search][search-atom-repo-label-encoding] | [search][search-atom-org-label-encoding] | Related to character encoding. | +| `network` | [search][search-atom-repo-label-network] | [search][search-atom-org-label-network] | Related to network problems or working with remote files (e.g. on network drives). | +| `git` | [search][search-atom-repo-label-git] | [search][search-atom-org-label-git] | Related to Git functionality (e.g. problems with gitignore files or with showing the correct file status). | + +#### `atom/atom` Topic Categories + +| Label name | `atom/atom` :mag_right: | `atom`‑org :mag_right: | Description | +| --- | --- | --- | --- | +| `editor-rendering` | [search][search-atom-repo-label-editor-rendering] | [search][search-atom-org-label-editor-rendering] | Related to language-independent aspects of rendering text (e.g. scrolling, soft wrap, and font rendering). | +| `build-error` | [search][search-atom-repo-label-build-error] | [search][search-atom-org-label-build-error] | Related to problems with building Atom from source. | +| `error-from-pathwatcher` | [search][search-atom-repo-label-error-from-pathwatcher] | [search][search-atom-org-label-error-from-pathwatcher] | Related to errors thrown by the [pathwatcher library](https://github.com/atom/node-pathwatcher). | +| `error-from-save` | [search][search-atom-repo-label-error-from-save] | [search][search-atom-org-label-error-from-save] | Related to errors thrown when saving files. | +| `error-from-open` | [search][search-atom-repo-label-error-from-open] | [search][search-atom-org-label-error-from-open] | Related to errors thrown when opening files. | +| `installer` | [search][search-atom-repo-label-installer] | [search][search-atom-org-label-installer] | Related to the Atom installers for different OSes. | +| `auto-updater` | [search][search-atom-repo-label-auto-updater] | [search][search-atom-org-label-auto-updater] | Related to the auto-updater for different OSes. | +| `deprecation-help` | [search][search-atom-repo-label-deprecation-help] | [search][search-atom-org-label-deprecation-help] | Issues for helping package authors remove usage of deprecated APIs in packages. | +| `electron` | [search][search-atom-repo-label-electron] | [search][search-atom-org-label-electron] | Issues that require changes to [Electron](https://electron.atom.io) to fix or implement. | + +#### Pull Request Labels + +| Label name | `atom/atom` :mag_right: | `atom`‑org :mag_right: | Description +| --- | --- | --- | --- | +| `work-in-progress` | [search][search-atom-repo-label-work-in-progress] | [search][search-atom-org-label-work-in-progress] | Pull requests which are still being worked on, more changes will follow. | +| `needs-review` | [search][search-atom-repo-label-needs-review] | [search][search-atom-org-label-needs-review] | Pull requests which need code review, and approval from maintainers or Atom core team. | +| `under-review` | [search][search-atom-repo-label-under-review] | [search][search-atom-org-label-under-review] | Pull requests being reviewed by maintainers or Atom core team. | +| `requires-changes` | [search][search-atom-repo-label-requires-changes] | [search][search-atom-org-label-requires-changes] | Pull requests which need to be updated based on review comments and then reviewed again. | +| `needs-testing` | [search][search-atom-repo-label-needs-testing] | [search][search-atom-org-label-needs-testing] | Pull requests which need manual testing. | + +[search-atom-repo-label-enhancement]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aenhancement +[search-atom-org-label-enhancement]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aenhancement +[search-atom-repo-label-bug]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Abug +[search-atom-org-label-bug]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Abug +[search-atom-repo-label-question]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aquestion +[search-atom-org-label-question]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aquestion +[search-atom-repo-label-feedback]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Afeedback +[search-atom-org-label-feedback]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Afeedback +[search-atom-repo-label-help-wanted]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Ahelp-wanted +[search-atom-org-label-help-wanted]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Ahelp-wanted +[search-atom-repo-label-beginner]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Abeginner +[search-atom-org-label-beginner]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Abeginner +[search-atom-repo-label-more-information-needed]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Amore-information-needed +[search-atom-org-label-more-information-needed]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Amore-information-needed +[search-atom-repo-label-needs-reproduction]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aneeds-reproduction +[search-atom-org-label-needs-reproduction]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aneeds-reproduction +[search-atom-repo-label-triage-help-needed]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Atriage-help-needed +[search-atom-org-label-triage-help-needed]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Atriage-help-needed +[search-atom-repo-label-windows]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Awindows +[search-atom-org-label-windows]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Awindows +[search-atom-repo-label-linux]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Alinux +[search-atom-org-label-linux]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Alinux +[search-atom-repo-label-mac]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Amac +[search-atom-org-label-mac]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Amac +[search-atom-repo-label-documentation]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Adocumentation +[search-atom-org-label-documentation]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Adocumentation +[search-atom-repo-label-performance]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aperformance +[search-atom-org-label-performance]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aperformance +[search-atom-repo-label-security]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Asecurity +[search-atom-org-label-security]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Asecurity +[search-atom-repo-label-ui]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aui +[search-atom-org-label-ui]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aui +[search-atom-repo-label-api]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aapi +[search-atom-org-label-api]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aapi +[search-atom-repo-label-crash]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Acrash +[search-atom-org-label-crash]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Acrash +[search-atom-repo-label-auto-indent]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aauto-indent +[search-atom-org-label-auto-indent]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aauto-indent +[search-atom-repo-label-encoding]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aencoding +[search-atom-org-label-encoding]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aencoding +[search-atom-repo-label-network]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Anetwork +[search-atom-org-label-network]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Anetwork +[search-atom-repo-label-uncaught-exception]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Auncaught-exception +[search-atom-org-label-uncaught-exception]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Auncaught-exception +[search-atom-repo-label-git]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Agit +[search-atom-org-label-git]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Agit +[search-atom-repo-label-blocked]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Ablocked +[search-atom-org-label-blocked]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Ablocked +[search-atom-repo-label-duplicate]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aduplicate +[search-atom-org-label-duplicate]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aduplicate +[search-atom-repo-label-wontfix]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Awontfix +[search-atom-org-label-wontfix]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Awontfix +[search-atom-repo-label-invalid]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Ainvalid +[search-atom-org-label-invalid]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Ainvalid +[search-atom-repo-label-package-idea]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Apackage-idea +[search-atom-org-label-package-idea]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Apackage-idea +[search-atom-repo-label-wrong-repo]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Awrong-repo +[search-atom-org-label-wrong-repo]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Awrong-repo +[search-atom-repo-label-editor-rendering]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aeditor-rendering +[search-atom-org-label-editor-rendering]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aeditor-rendering +[search-atom-repo-label-build-error]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Abuild-error +[search-atom-org-label-build-error]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Abuild-error +[search-atom-repo-label-error-from-pathwatcher]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aerror-from-pathwatcher +[search-atom-org-label-error-from-pathwatcher]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aerror-from-pathwatcher +[search-atom-repo-label-error-from-save]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aerror-from-save +[search-atom-org-label-error-from-save]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aerror-from-save +[search-atom-repo-label-error-from-open]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aerror-from-open +[search-atom-org-label-error-from-open]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aerror-from-open +[search-atom-repo-label-installer]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Ainstaller +[search-atom-org-label-installer]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Ainstaller +[search-atom-repo-label-auto-updater]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Aauto-updater +[search-atom-org-label-auto-updater]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aauto-updater +[search-atom-repo-label-deprecation-help]: https://github.com/issues?q=is%3Aopen+is%3Aissue+repo%3Aatom%2Fatom+label%3Adeprecation-help +[search-atom-org-label-deprecation-help]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Adeprecation-help +[search-atom-repo-label-electron]: https://github.com/issues?q=is%3Aissue+repo%3Aatom%2Fatom+is%3Aopen+label%3Aelectron +[search-atom-org-label-electron]: https://github.com/issues?q=is%3Aopen+is%3Aissue+user%3Aatom+label%3Aelectron +[search-atom-repo-label-work-in-progress]: https://github.com/pulls?q=is%3Aopen+is%3Apr+repo%3Aatom%2Fatom+label%3Awork-in-progress +[search-atom-org-label-work-in-progress]: https://github.com/pulls?q=is%3Aopen+is%3Apr+user%3Aatom+label%3Awork-in-progress +[search-atom-repo-label-needs-review]: https://github.com/pulls?q=is%3Aopen+is%3Apr+repo%3Aatom%2Fatom+label%3Aneeds-review +[search-atom-org-label-needs-review]: https://github.com/pulls?q=is%3Aopen+is%3Apr+user%3Aatom+label%3Aneeds-review +[search-atom-repo-label-under-review]: https://github.com/pulls?q=is%3Aopen+is%3Apr+repo%3Aatom%2Fatom+label%3Aunder-review +[search-atom-org-label-under-review]: https://github.com/pulls?q=is%3Aopen+is%3Apr+user%3Aatom+label%3Aunder-review +[search-atom-repo-label-requires-changes]: https://github.com/pulls?q=is%3Aopen+is%3Apr+repo%3Aatom%2Fatom+label%3Arequires-changes +[search-atom-org-label-requires-changes]: https://github.com/pulls?q=is%3Aopen+is%3Apr+user%3Aatom+label%3Arequires-changes +[search-atom-repo-label-needs-testing]: https://github.com/pulls?q=is%3Aopen+is%3Apr+repo%3Aatom%2Fatom+label%3Aneeds-testing +[search-atom-org-label-needs-testing]: https://github.com/pulls?q=is%3Aopen+is%3Apr+user%3Aatom+label%3Aneeds-testing + +[beginner]:https://github.com/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+label%3Abeginner+label%3Ahelp-wanted+user%3Aatom+sort%3Acomments-desc +[help-wanted]:https://github.com/issues?q=is%3Aopen+is%3Aissue+label%3Ahelp-wanted+user%3Aatom+sort%3Acomments-desc+-label%3Abeginner diff --git a/06_MAKE_IT_OFFICIAL.md b/06_MAKE_IT_OFFICIAL.md new file mode 100644 index 0000000..d25204a --- /dev/null +++ b/06_MAKE_IT_OFFICIAL.md @@ -0,0 +1,18 @@ +# 😎 BECOME OFFICIALLY C00L 😎 + +So you've built a tap or a target have you? We think that's pretty groovy. To submit a tap for integration with Stitch an become official we ask that they follow a set standard. If you're interested in submitting to be an official tap we're mighty obliged and created a checklist so you can increase your chances of integration. + +### Check out the [BEST PRACTICES](06_BEST_PRACTICES.md) doc which will have all the instructions and way more in depth details of the following: +- [ ] Your work has a `start_date` field in the config +- [ ] Your work accepts a `user_agent` field in the config +- [ ] Your work respects API rate limits +- [ ] Your work doesn't impose memory constraints +- [ ] Your dates are all in RFC3339 format +- [ ] All states are in date format +- [ ] All data is streamed in ascending order if possible +- [ ] Your work doesn't contain any sensitive info like API keys, client work, etc. +- [ ] Please keep your schemas stored in a schema folder +- [ ] You've tested your work +- [ ] Please run pylint on your work +- [ ] Your work shows metrics +- [ ] Message [@BrianSloane](mailto:brian@stitchdata.com) or [@Ash_Hathaway](mailto:ashley@stitchdata.com) or reach out to them on [Slack](https://singer-slackin.herokuapp.com/) and let them know you'd like some 🎁 swag 🎁, please. diff --git a/BEST_PRACTICES.md b/07_BEST_PRACTICES.md similarity index 99% rename from BEST_PRACTICES.md rename to 07_BEST_PRACTICES.md index 2bbffa8..12e4f5c 100644 --- a/BEST_PRACTICES.md +++ b/07_BEST_PRACTICES.md @@ -1,5 +1,4 @@ -Best Practices for Building a Singer Tap -============================================ +# BEST PRACTICES Language -------- diff --git a/SPEC.md b/08_SPEC.md similarity index 66% rename from SPEC.md rename to 08_SPEC.md index 7961874..abf8aff 100644 --- a/SPEC.md +++ b/08_SPEC.md @@ -188,3 +188,72 @@ should be a new MINOR version. [JSON Schema]: http://json-schema.org/ "JSON Schema" [Semantic Versioning]: http://semver.org/ "Semantic Versioning" + + + +# Data Types and Schemas + +JSON is used to represent data because it is ubiquitous, readable, and +especially appropriate for the large universe of sources that expose data +as JSON like web APIs. However, JSON is far from perfect: + + - it has a limited type system, without support for common types like + dates, and no distinction between integers and floating point numbers + + - while its flexibility makes it easy to use, it can also cause + compatibility problems + +*Schemas* are used to solve these problems. Generally speaking, a schema +is anything that describes how data is structured. In Streams, schemas are +written by streamers in *SCHEMA* messages, formatted following the +[JSON Schema] spec. + +Schemas solve the limited data types problem by providing more information +about how to interpret JSON's basic types. For example, the [JSON Schema] +spec distinguishes between `integer` and `number` types, where the latter +is appropriately interpretted as a floating point. Additionally, it +defines a string format called `date-time` that can be used to indicate +when a data point is expected to be a +[properly formatted](https://tools.ietf.org/html/rfc3339) timestamp +string. + +Schemas mitigate JSON's compatibility problem by providing an easy way to +validate the structure of a set of data points. Streams deploys this +concept by encouraging use of only a single schema for each substream, and +validating each data point against its schema prior to persistence. This +forces the streamer author to think about how to resolve schema evolution +and compatibility questions, placing that responsibility as close to the +original data source as possible, and freeing downstream systems from +making uninformed assumptions to resolve these issues. + +Schemas are required, but they can be defined in the broadest terms - a +JSON Schema of '{}' validates all data points. However, it is a best +practice for streamer authors to define schemas as narrowly as possible. + +## Schemas in Stitch + +The Stitch persister and Stitch API use schemas as follows: + + - the Stitch persister fails when it encounters a data point that doesn't + validate against its stream's latest schema + - schemas must be an 'object' at the top level + - Stitch supports schemas with objects nested to any depth, and arrays of + objects nested to any depth - more info in the + [Stitch docs](https://www.stitchdata.com/docs/data-structure/nested-data-structures-row-count-impact) + - properties of type `string` and format `date-time` are converted to + the appropriate timestamp or datetime type in the destination database + - properties of type `integer` are converted to integer in the destination + database + - properties of type `number` are converted to decimal or numeric in the + destination database + - (soon) the `maxLength` parameter of a property of type `string` is used + to define the width of the corresponding varchar column in the + destination database + - when Stitch encounters a schema for a stream that is incompatible with + the table that stream is to be loaded into in the destination database, + it adds the data to the + [reject pile](https://www.stitchdata.com/docs/data-structure/identifying-rejected-records) + + +[JSON Schema]: http://json-schema.org/ + diff --git a/PROPOSALS.md b/09_PROPOSALS.md similarity index 100% rename from PROPOSALS.md rename to 09_PROPOSALS.md diff --git a/10_CODE_OF_CONDUCT.md b/10_CODE_OF_CONDUCT.md new file mode 100644 index 0000000..cb31494 --- /dev/null +++ b/10_CODE_OF_CONDUCT.md @@ -0,0 +1,80 @@ +# Code of Conduct + +## 1. Purpose + +A primary goal of Singer is to be inclusive to the largest number of contributors, with the most varied and diverse backgrounds possible. As such, we are committed to providing a friendly, safe and welcoming environment for all, regardless of gender, sexual orientation, ability, ethnicity, socioeconomic status, and religion (or lack thereof). + +This code of conduct outlines our expectations for all those who participate in our community, as well as the consequences for unacceptable behavior. + +We invite all those who participate in Singer to help us create safe and positive experiences for everyone. + +## 2. Open Source Citizenship + +A supplemental goal of this Code of Conduct is to increase open source citizenship by encouraging participants to recognize and strengthen the relationships between our actions and their effects on our community. + +Communities mirror the societies in which they exist and positive action is essential to counteract the many forms of inequality and abuses of power that exist in society. + +If you see someone who is making an extra effort to ensure our community is welcoming, friendly, and encourages all participants to contribute to the fullest extent, we want to know. + +## 3. Expected Behavior + +The following behaviors are expected and requested of all community members: + +* Participate in an authentic and active way. In doing so, you contribute to the health and longevity of this community. +* Exercise consideration and respect in your speech and actions. +* Attempt collaboration before conflict. +* Refrain from demeaning, discriminatory, or harassing behavior and speech. +* Be mindful of your surroundings and of your fellow participants. Alert community leaders if you notice a dangerous situation, someone in distress, or violations of this Code of Conduct, even if they seem inconsequential. +* Remember that community event venues may be shared with members of the public; please be respectful to all patrons of these locations. + +## 4. Unacceptable Behavior + +The following behaviors are considered harassment and are unacceptable within our community: + +* Violence, threats of violence or violent language directed against another person. +* Sexist, racist, homophobic, transphobic, ableist or otherwise discriminatory jokes and language. +* Posting or displaying sexually explicit or violent material. +* Posting or threatening to post other people’s personally identifying information ("doxing"). +* Personal insults, particularly those related to gender, sexual orientation, race, religion, or disability. +* Inappropriate photography or recording. +* Inappropriate physical contact. You should have someone’s consent before touching them. +* Unwelcome sexual attention. This includes, sexualized comments or jokes; inappropriate touching, groping, and unwelcomed sexual advances. +* Deliberate intimidation, stalking or following (online or in person). +* Advocating for, or encouraging, any of the above behavior. +* Sustained disruption of community events, including talks and presentations. + +## 5. Consequences of Unacceptable Behavior + +Unacceptable behavior from any community member, including sponsors and those with decision-making authority, will not be tolerated. + +Anyone asked to stop unacceptable behavior is expected to comply immediately. + +If a community member engages in unacceptable behavior, the community organizers may take any action they deem appropriate, up to and including a temporary ban or permanent expulsion from the community without warning (and without refund in the case of a paid event). + +## 6. Reporting Guidelines + +If you are subject to or witness unacceptable behavior, or have any other concerns, please notify a community organizer as soon as possible. ashley@stitchdata.com. + +Additionally, community organizers are available to help community members engage with local law enforcement or to otherwise help those experiencing unacceptable behavior feel safe. In the context of in-person events, organizers will also provide escorts as desired by the person experiencing distress. + +## 7. Addressing Grievances + +If you feel you have been falsely or unfairly accused of violating this Code of Conduct, you should notify Singer with a concise description of your grievance. Your grievance will be handled in accordance with our existing governing policies. + +## 8. Scope + +We expect all community participants (contributors, paid or otherwise; sponsors; and other guests) to abide by this Code of Conduct in all community venues–online and in-person–as well as in all one-on-one communications pertaining to community business. + +This code of conduct and its related procedures also applies to unacceptable behavior occurring outside the scope of community activities when such behavior has the potential to adversely affect the safety and well-being of community members. + +## 9. Contact info + +ashley@stitchdata.com + +## 10. License and attribution + +This Code of Conduct is distributed under a [Creative Commons Attribution-ShareAlike license](http://creativecommons.org/licenses/by-sa/3.0/). + +Portions of text derived from the [Django Code of Conduct](https://www.djangoproject.com/conduct/) and the [Geek Feminism Anti-Harassment Policy](http://geekfeminism.wikia.com/wiki/Conference_anti-harassment/Policy). + +Retrieved on November 22, 2016 from [http://citizencodeofconduct.org/](http://citizencodeofconduct.org/) diff --git a/README.md b/README.md deleted file mode 100644 index e4f38a1..0000000 --- a/README.md +++ /dev/null @@ -1,274 +0,0 @@ -# Getting Started with Singer - -Singer is an open source standard for moving data between databases, -web APIs, files, queues, and just about anything else you can think -of. The [Singer spec] describes how data extraction scripts β€” called -β€œTaps” β€” and data loading scripts β€” called β€œTargets” β€” should -communicate using a standard JSON-based data format over `stdout`. By -conforming to this spec, Taps and Targets can be used in any -combination to move data from any source to any destination. - -**Topics** - - - [Using Singer to populate Google Sheets](#using-singer-to-populate-google-sheets) - - [Developing a Tap](#developing-a-tap) - - [Additional Resources](#additional-resources) - -## Using Singer to populate Google Sheets - -The [Google Sheets Target] can be combined with any Singer Tap to -populate a Google Sheet with data. This example will use currency -exchange rate data from the [Fixer.io Tap]. [Fixer.io] is a free API for -current and historical foreign exchange rates published by the -European Central Bank. - -The steps are: - 1. [Activate the Google Sheets API](#step-1---activate-the-google-sheets-api) - 1. [Configure the Target](#step-2---configure-the-target) - 1. [Install](#step-3---install) - 1. [Run](#step-4---run) - 1. [Save State (optional)](#step-5---save-state-optional) - -### Step 1 - Activate the Google Sheets API - - (originally found in the [Google API - docs](https://developers.google.com/sheets/api/quickstart/python)) - - 1. Use [this - wizard](https://console.developers.google.com/start/api?id=sheets.googleapis.com) - to create or select a project in the Google Developers Console and - activate the Sheets API. Click Continue, then Go to credentials. - - 1. On the **Add credentials to your project** page, click the - **Cancel** button. - - 1. At the top of the page, select the **OAuth consent screen** - tab. Select an **Email address**, enter a **Product name** if not - already set, and click the **Save** button. - - 1. Select the **Credentials** tab, click the **Create credentials** - button and select **OAuth client ID**. - - 1. Select the application type **Other**, enter the name "Singer - Sheets Target", and click the **Create** button. - - 1. Click **OK** to dismiss the resulting dialog. - - 1. Click the Download button to the right of the client ID. - - 1. Move this file to your working directory and rename it - `client_secret.json`. - -### Step 2 - Configure the Target - -Created a file called `config.json` in your working directory, -following [config.sample.json](https://github.com/singer-io/target-gsheet/blob/master/config.sample.json). The required -`spreadsheet_id` parameter is the value between the "/d/" and the -"/edit" in the URL of your spreadsheet. For example, consider the -following URL that references a Google Sheets spreadsheet: - -``` -https://docs.google.com/spreadsheets/d/1qpyC0XzvTcKT6EISywvqESX3A0MwQoFDE8p-Bll4hps/edit#gid=0 -``` - -The ID of this spreadsheet is -`1qpyC0XzvTcKT6EISywvqESX3A0MwQoFDE8p-Bll4hps`. - - -### Step 3 - Install - -First, make sure Python 3 is installed on your system or follow these -installation instructions for [Mac](python-mac) or -[Ubuntu](python-ubuntu). - -`target-gsheet` can be run with any [Singer Tap] to move data from -sources like [Braintree], [Freshdesk] and [Hubspot] to Google -Sheets. We'll use the [Fixer.io Tap] - which pulls currency exchange -rate data from a public data set - as an example. - -We recommend installing each Tap and Target in a separate Python virtual -environment. This will insure that you won't have conflicting dependencies -between any Taps and Targets. - -These commands will install `tap-fixerio` and `target-gsheet` with pip in -their own virtual environments: - -```bash -# Install tap-fixerio in its own virtualenv -virtualenv -p python3 tap-fixerio -tap-fixerio/bin/pip install tap-fixerio - -# Install target-gsheet in its own virtualenv -virtualenv -p python3 target-gsheet -target-gsheet/bin/pip install target-gsheet -``` - -### Step 4 - Run - -This command will pipe the output of `tap-fixerio` to `target-gsheet`, -using the configuration file created in Step 2: - -```bash -β€Ί tap-fixerio/bin/tap-fixerio | target-gsheet/bin/target-gsheet -c config.json - INFO Replicating the latest exchange rate data from fixer.io - INFO Tap exiting normally -``` - -`target-gsheet` will attempt to open a new window or tab in your -default browser to perform authentication. If this fails, copy the URL -from the console and manually open it in your browser. - -If you are not already logged into your Google account, you will be -prompted to log in. If you are logged into multiple Google accounts, -you will be asked to select one account to use for the -authorization. Click the **Accept** button to allow `target-gsheet` to -access your Google Sheet. You can close the tab after the signup flow -is complete. - -Each stream generated by the Tap will be written to a different sheet -in your Google Sheet. For the [Fixer.io Tap] you'll see a single sheet -named `exchange_rate`. - -### Step 5 - Save State (optional) - -When `target-gsheet` is run as above it writes log lines to `stderr`, -but `stdout` is reserved for outputting **State** messages. A State -message is a JSON-formatted line with data that the Tap wants -persisted between runs - often "high water mark" information that the -Tap can use to pick up where it left off on the next run. Read more -about State messages in the [Singer spec]. - -Targets write State messages to `stdout` once all data that appeared -in the stream before the State message has been processed by the -Target. Note that although the State message is sent into the target, -in most cases the target's process won't actually store it anywhere or -do anything with it other than repeat it back to `stdout`. - -Taps like the [Fixer.io Tap] can also accept a `--state` argument -that, if present, points to a file containing the last persisted State -value. This enables Taps to work incrementally - the State -checkpoints the last value that was handled by the Target, and the -next time the Tap is run it should pick up from that point. - -To run the [Fixer.io Tap] incrementally, point it to a State file and -capture the persister's `stdout` like this: - -```bash -β€Ί tap-fixerio --state state.json | target-gsheet -c config.json >> state.json -β€Ί tail -1 state.json > state.json.tmp && mv state.json.tmp state.json -(rinse and repeat) -``` - -## Developing a Tap - -If you can't find an existing Tap for your data source, then it's time -to build your own. - -**Topics**: - - [Hello, world](#hello-world) - - [A Python Tap](#a-python-tap) - -### Hello, world - -A Tap is just a program, written in any language, that outputs data to -`stdout` according to the [Singer spec]. In fact, your first Tap can -be written from the command line, without any programming at all: - -```bash -β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' -``` - -This writes the datapoint `{"value":"world"}` to the *hello* -stream along with a schema indicating that `value` is a string. -That data can be piped into any Target, like the [Google Sheets -Target], over `stdin`: - -```bash -β€Ί printf '{"type":"SCHEMA", "stream":"hello","key_properties":[],"schema":{"type":"object", "properties":{"value":{"type":"string"}}}}\n{"type":"RECORD","stream":"hello","schema":"hello","record":{"value":"world"}}\n' | target-gsheet -c config.json -``` - -### A Python Tap - -To move beyond *Hello, world* you'll need a real programming language. -Although any language will do, we have built a Python library to help -you get up and running quickly. - -Let's write a Tap called `tap_ip.py` that retrieves the current - IP using icanhazip.com, and writes that data with a timestamp. - -First, install the [Singer helper library] with `pip`: - -```bash -β€Ί pip install singer-python -``` - -Then, open up a new file called `tap_ip.py` in your favorite editor. - -```python -import singer -import urllib.request -from datetime import datetime, timezone -``` - -We'll use the `datetime` module to get the current timestamp, the -`singer` module to write data to `stdout` in the correct format, and -the `urllib.request` module to make a request to icanhazip.com. - -```python -now = datetime.now(timezone.utc).isoformat() -schema = { - 'properties': { - 'ip': {'type': 'string'}, - 'timestamp': {'type': 'string', 'format': 'date-time'}, - }, -} - -``` - -This sets up some of the data we'll need - the current time, and the -schema of the data we'll be writing to the stream formatted as a [JSON -Schema]. - -```python -with urllib.request.urlopen('http://icanhazip.com') as response: - ip = response.read().decode('utf-8').strip() - singer.write_schema('my_ip', schema, 'timestamp') - singer.write_records('my_ip', [{'timestamp': now, 'ip': ip}]) -``` - -Finally, we make the HTTP request, parse the response, and then make -two calls to the `singer` library: - - - `singer.write_schema` which writes the schema of the `my_ip` stream and defines its primary key - - `singer.write_records` to write a record to that stream - -We can send this data to Google Sheets by running our new Tap -with the [Google Sheets Target]: - -```bash -β€Ί python tap_ip.py | target-gsheet -c config.json -``` - -## Additional Resources - -Join the [Singer Slack channel] to get help from members of the Singer -community. - ---- - -Copyright © 2017 Stitch - -[Singer spec]: SPEC.md -[Singer Tap]: https://singer.io -[Braintree]: https://github.com/singer-io/tap-braintree -[Freshdesk]: https://github.com/singer-io/tap-freshdesk -[Hubspot]: https://github.com/singer-io/tap-hubspot -[Fixer.io Tap]: https://github.com/singer-io/tap-fixerio -[Fixer.io]: http://fixer.io -[python-mac]: http://docs.python-guide.org/en/latest/starting/install3/osx/ -[python-ubuntu]: https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04 -[Google Sheets Target]: https://github.com/singer-io/target-gsheet -[Singer helper library]: https://github.com/singer-io/singer-python -[JSON Schema]: http://json-schema.org/ -[Singer Slack channel]: https://singer-slackin.herokuapp.com/ - diff --git a/SCHEMAS.md b/SCHEMAS.md deleted file mode 100644 index 7dda1d3..0000000 --- a/SCHEMAS.md +++ /dev/null @@ -1,65 +0,0 @@ -# Data Types and Schemas - -JSON is used to represent data because it is ubiquitous, readable, and -especially appropriate for the large universe of sources that expose data -as JSON like web APIs. However, JSON is far from perfect: - - - it has a limited type system, without support for common types like - dates, and no distinction between integers and floating point numbers - - - while its flexibility makes it easy to use, it can also cause - compatibility problems - -*Schemas* are used to solve these problems. Generally speaking, a schema -is anything that describes how data is structured. In Streams, schemas are -written by streamers in *SCHEMA* messages, formatted following the -[JSON Schema] spec. - -Schemas solve the limited data types problem by providing more information -about how to interpret JSON's basic types. For example, the [JSON Schema] -spec distinguishes between `integer` and `number` types, where the latter -is appropriately interpretted as a floating point. Additionally, it -defines a string format called `date-time` that can be used to indicate -when a data point is expected to be a -[properly formatted](https://tools.ietf.org/html/rfc3339) timestamp -string. - -Schemas mitigate JSON's compatibility problem by providing an easy way to -validate the structure of a set of data points. Streams deploys this -concept by encouraging use of only a single schema for each substream, and -validating each data point against its schema prior to persistence. This -forces the streamer author to think about how to resolve schema evolution -and compatibility questions, placing that responsibility as close to the -original data source as possible, and freeing downstream systems from -making uninformed assumptions to resolve these issues. - -Schemas are required, but they can be defined in the broadest terms - a -JSON Schema of '{}' validates all data points. However, it is a best -practice for streamer authors to define schemas as narrowly as possible. - -## Schemas in Stitch - -The Stitch persister and Stitch API use schemas as follows: - - - the Stitch persister fails when it encounters a data point that doesn't - validate against its stream's latest schema - - schemas must be an 'object' at the top level - - Stitch supports schemas with objects nested to any depth, and arrays of - objects nested to any depth - more info in the - [Stitch docs](https://www.stitchdata.com/docs/data-structure/nested-data-structures-row-count-impact) - - properties of type `string` and format `date-time` are converted to - the appropriate timestamp or datetime type in the destination database - - properties of type `integer` are converted to integer in the destination - database - - properties of type `number` are converted to decimal or numeric in the - destination database - - (soon) the `maxLength` parameter of a property of type `string` is used - to define the width of the corresponding varchar column in the - destination database - - when Stitch encounters a schema for a stream that is incompatible with - the table that stream is to be loaded into in the destination database, - it adds the data to the - [reject pile](https://www.stitchdata.com/docs/data-structure/identifying-rejected-records) - - -[JSON Schema]: http://json-schema.org/ From 52033160c3352df596e29afb82a1883ee283b74b Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:14:11 -0500 Subject: [PATCH 16/25] Rename 01_README.md to README.md --- 01_README.md => README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 01_README.md => README.md (100%) diff --git a/01_README.md b/README.md similarity index 100% rename from 01_README.md rename to README.md From a8902aa389a916d46f9f998cfe2e5504f460ba6e Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:38:02 -0500 Subject: [PATCH 17/25] Rename 02_EXTRACT_WITH_TAPS.md to 01_EXTRACT_WITH_TAPS.md --- 02_EXTRACT_WITH_TAPS.md => 01_EXTRACT_WITH_TAPS.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 02_EXTRACT_WITH_TAPS.md => 01_EXTRACT_WITH_TAPS.md (100%) diff --git a/02_EXTRACT_WITH_TAPS.md b/01_EXTRACT_WITH_TAPS.md similarity index 100% rename from 02_EXTRACT_WITH_TAPS.md rename to 01_EXTRACT_WITH_TAPS.md From e7fa864620ee985382bbfe9a66aaa8eaa74c1f50 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:38:39 -0500 Subject: [PATCH 18/25] Rename 03_SEND_TO_TARGETS.md to 02_SEND_TO_TARGETS.md --- 03_SEND_TO_TARGETS.md => 02_SEND_TO_TARGETS.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 03_SEND_TO_TARGETS.md => 02_SEND_TO_TARGETS.md (100%) diff --git a/03_SEND_TO_TARGETS.md b/02_SEND_TO_TARGETS.md similarity index 100% rename from 03_SEND_TO_TARGETS.md rename to 02_SEND_TO_TARGETS.md From 232386a50c6f8bca9c90d184a06a5dab8d7ab41c Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:39:11 -0500 Subject: [PATCH 19/25] Rename 04_COOL_TAPS_CLUB.md to 03_COOL_TAPS_CLUB.md --- 04_COOL_TAPS_CLUB.md => 03_COOL_TAPS_CLUB.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 04_COOL_TAPS_CLUB.md => 03_COOL_TAPS_CLUB.md (100%) diff --git a/04_COOL_TAPS_CLUB.md b/03_COOL_TAPS_CLUB.md similarity index 100% rename from 04_COOL_TAPS_CLUB.md rename to 03_COOL_TAPS_CLUB.md From e61e8d6b6b19976ab8384fa840e97d743757e0e0 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:39:26 -0500 Subject: [PATCH 20/25] Rename 05_MAKE_IT_OFFICIAL.md to 04_MAKE_IT_OFFICIAL.md --- 05_MAKE_IT_OFFICIAL.md => 04_MAKE_IT_OFFICIAL.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 05_MAKE_IT_OFFICIAL.md => 04_MAKE_IT_OFFICIAL.md (100%) diff --git a/05_MAKE_IT_OFFICIAL.md b/04_MAKE_IT_OFFICIAL.md similarity index 100% rename from 05_MAKE_IT_OFFICIAL.md rename to 04_MAKE_IT_OFFICIAL.md From f9be6d4f0e7d875766b4303f116a21095454cf98 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:39:51 -0500 Subject: [PATCH 21/25] Rename 06_BEST_PRACTICES.md to 05_BEST_PRACTICES.md --- 06_BEST_PRACTICES.md => 05_BEST_PRACTICES.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 06_BEST_PRACTICES.md => 05_BEST_PRACTICES.md (100%) diff --git a/06_BEST_PRACTICES.md b/05_BEST_PRACTICES.md similarity index 100% rename from 06_BEST_PRACTICES.md rename to 05_BEST_PRACTICES.md From b0ef16494f1f470bb3afd8d973cc89ea20342ec8 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:40:09 -0500 Subject: [PATCH 22/25] Rename 07_SPEC.md to 06_SPEC.md --- 07_SPEC.md => 06_SPEC.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 07_SPEC.md => 06_SPEC.md (100%) diff --git a/07_SPEC.md b/06_SPEC.md similarity index 100% rename from 07_SPEC.md rename to 06_SPEC.md From 2a751ad5c922cfd4b565f01b9a79828fb22a5aef Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:40:29 -0500 Subject: [PATCH 23/25] Rename 08_PROPOSALS.md to 07_PROPOSALS.md --- 08_PROPOSALS.md => 07_PROPOSALS.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 08_PROPOSALS.md => 07_PROPOSALS.md (100%) diff --git a/08_PROPOSALS.md b/07_PROPOSALS.md similarity index 100% rename from 08_PROPOSALS.md rename to 07_PROPOSALS.md From 4814f469604ce9e4b4c2f93de83c9fb9a91f76a0 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:40:48 -0500 Subject: [PATCH 24/25] Rename 09_CODE_OF_CONDUCT.md to 08_CODE_OF_CONDUCT.md --- 09_CODE_OF_CONDUCT.md => 08_CODE_OF_CONDUCT.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename 09_CODE_OF_CONDUCT.md => 08_CODE_OF_CONDUCT.md (100%) diff --git a/09_CODE_OF_CONDUCT.md b/08_CODE_OF_CONDUCT.md similarity index 100% rename from 09_CODE_OF_CONDUCT.md rename to 08_CODE_OF_CONDUCT.md From 8ddf81aabf15896821791887895b9098b6f5a238 Mon Sep 17 00:00:00 2001 From: Ash Hathaway Date: Mon, 24 Jul 2017 16:53:09 -0500 Subject: [PATCH 25/25] changed links one number back --- 01_EXTRACT_WITH_TAPS.md | 6 +++--- 04_MAKE_IT_OFFICIAL.md | 2 +- 06_SPEC.md | 7 ++----- README.md | 10 +++++----- 4 files changed, 11 insertions(+), 14 deletions(-) diff --git a/01_EXTRACT_WITH_TAPS.md b/01_EXTRACT_WITH_TAPS.md index 5ac97e4..feb711b 100644 --- a/01_EXTRACT_WITH_TAPS.md +++ b/01_EXTRACT_WITH_TAPS.md @@ -2,7 +2,7 @@ ## Taps extract data from any source and write that data to a standard stream in a JSON-based format. -Be Check out our [official](05_MAKE_IT_OFFICIAL.md) and [unofficial](04_COOL_UNOFFICIAL_CLUB.md) pages before creating your own since it might save you some time in the long run. +Be Check out our [official](04_MAKE_IT_OFFICIAL.md) and [unofficial](03_COOL_TAPS_CLUB.md) pages before creating your own since it might save you some time in the long run. ### Making Taps @@ -10,7 +10,7 @@ If a tap for your use case doesn't exist yet have no fear! This documentation wi ### πŸ‘©πŸ½β€πŸ’» πŸ‘¨πŸ»β€πŸ’» Hello, world -A Tap is just a program, written in any language, that outputs data to `stdout` according to the [Singer spec](07_SPEC.md). +A Tap is just a program, written in any language, that outputs data to `stdout` according to the [Singer spec](06_SPEC.md). In fact, your first Tap can be written from the command line, without any programming at all: @@ -114,4 +114,4 @@ More simply the formula is: β€Ί python YOUR_TAP_FILE.py | TARGET-TYPE ``` -This assumes your target is intalled locally. Which you can read more about by heading over to the [targets page](03_SEND_TO_TARGETS). +This assumes your target is intalled locally. Which you can read more about by heading over to the [targets page](02_SEND_TO_TARGETS). diff --git a/04_MAKE_IT_OFFICIAL.md b/04_MAKE_IT_OFFICIAL.md index faa082f..6c8d293 100644 --- a/04_MAKE_IT_OFFICIAL.md +++ b/04_MAKE_IT_OFFICIAL.md @@ -2,7 +2,7 @@ So you've built a tap or a target have you? We think that's pretty groovy. To submit a tap for integration with Stitch an become official we ask that they follow a set standard. If you're interested in submitting to be an official tap we're mighty obliged and created a checklist so you can increase your chances of integration. -### Check out the [BEST PRACTICES](06_BEST_PRACTICES.md) doc which will have all the instructions and way more in depth details of the following: +### Check out the [BEST PRACTICES](05_BEST_PRACTICES.md) doc which will have all the instructions and way more in depth details of the following: - [ ] Your work has a `start_date` field in the config - [ ] Your work accepts a `user_agent` field in the config - [ ] Your work respects API rate limits diff --git a/06_SPEC.md b/06_SPEC.md index abf8aff..a7d073c 100644 --- a/06_SPEC.md +++ b/06_SPEC.md @@ -130,7 +130,7 @@ Example: SCHEMA messages describe the datatypes of data in the stream. They must have the following properties: - - `schema` **Required**. A [JSON Schema] describing the + - `schema` **Required**. A [JSON Schema](http://json-schema.org/) describing the `data` property of RECORDs from the same `stream` - `stream` **Required**. The string name of the stream that this @@ -206,7 +206,7 @@ as JSON like web APIs. However, JSON is far from perfect: *Schemas* are used to solve these problems. Generally speaking, a schema is anything that describes how data is structured. In Streams, schemas are written by streamers in *SCHEMA* messages, formatted following the -[JSON Schema] spec. +[JSON Schema](http://json-schema.org/) spec. Schemas solve the limited data types problem by providing more information about how to interpret JSON's basic types. For example, the [JSON Schema] @@ -254,6 +254,3 @@ The Stitch persister and Stitch API use schemas as follows: it adds the data to the [reject pile](https://www.stitchdata.com/docs/data-structure/identifying-rejected-records) - -[JSON Schema]: http://json-schema.org/ - diff --git a/README.md b/README.md index 5871ab8..4cf3277 100644 --- a/README.md +++ b/README.md @@ -9,12 +9,12 @@ In this documentation we'll take you through a number of scenarios. -- 🍺 If you'd like to pull or extract data check out [TAPS](02_EXTRACT_WITH_TAPS.md) -- 🎯 If you'd like to send or load data check out [TARGETS](03_LOAD_WITH_TARGETS.md) +- 🍺 If you'd like to pull or extract data check out [TAPS](01_EXTRACT_WITH_TAPS.md) +- 🎯 If you'd like to send or load data check out [TARGETS](02_LOAD_WITH_TARGETS.md) - πŸ“ If you want to dive in some technical goodness check out our [SPECS](07_SPEC.md) -- πŸ˜Žβœ… Once you've created your own tap or target be sure to let us know and join our kool kids [UNOFFICIAL](04_COOL_UNOFFICIAL_CLUB.md) club or learn how to submit to be part of the super cool [OFFICIAL](05_MAKE_IT_OFFICIAL.md) integrations. -- πŸ’― Check this out to learn more about [BEST PRACTICES](06_BEST_PRACTICES.md) -- 🀝 And above all please respect our [CODE OF CONDUCT](09_CODE_OF_CONDUCT.md) +- πŸ˜Žβœ… Once you've created your own tap or target be sure to let us know and join our kool kids [UNOFFICIAL](03_COOL_UNOFFICIAL_CLUB.md) club or learn how to submit to be part of the super cool [OFFICIAL](04_MAKE_IT_OFFICIAL.md) integrations. +- πŸ’― Check this out to learn more about [BEST PRACTICES](05_BEST_PRACTICES.md) +- 🀝 And above all please respect our [CODE OF CONDUCT](08_CODE_OF_CONDUCT.md) ### Communication