diff --git a/docs/data-loaders.md b/docs/data-loaders.md index dd8800c61..0c3d0c08c 100644 --- a/docs/data-loaders.md +++ b/docs/data-loaders.md @@ -16,7 +16,7 @@ Data loaders are polyglot: they can be written in any programming language. They A data loader can be as simple as a shell script that invokes [curl](https://curl.se/) to fetch recent earthquakes from the [USGS](https://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php): ```sh -curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson +curl -f https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson ``` Data loaders use [file-based routing](#routing), so assuming this shell script is named `quakes.json.sh`, a `quakes.json` file is then generated at build time. You can access this file from the client using [`FileAttachment`](./files): @@ -230,7 +230,7 @@ If multiple requests are made concurrently for the same data loader, the data lo ## Output -Data loaders must output to [standard output](). The first extension (such as `.csv`) does not affect the generated snapshot; the data loader is solely responsible for producing the expected output (such as CSV). If you wish to log additional information from within a data loader, be sure to log to standard error, say by using [`console.warn`](https://developer.mozilla.org/en-US/docs/Web/API/console/warn) or `process.stderr`; otherwise the logs will be included in the output file and sent to the client. +Data loaders must output to [standard output](). The first extension (such as `.csv`) does not affect the generated snapshot; the data loader is solely responsible for producing the expected output (such as CSV). If you wish to log additional information from within a data loader, be sure to log to standard error, say by using [`console.warn`](https://developer.mozilla.org/en-US/docs/Web/API/console/warn) or `process.stderr`; otherwise the logs will be included in the output file and sent to the client. If you use `curl` as above, we recommend the `-f` flag (equivalently, the `--fail` option) to make the data loader return an error when the download fails. ## Building @@ -247,7 +247,7 @@ Data loaders generate files at build time that live alongside other [static file Where `quakes.json.sh` is: ```sh -curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson +curl -f https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson ``` This will produce the following output root: diff --git a/docs/data/dft-road-collisions.csv.sh b/docs/data/dft-road-collisions.csv.sh index b0e2f6c1f..a470b5c5a 100755 --- a/docs/data/dft-road-collisions.csv.sh +++ b/docs/data/dft-road-collisions.csv.sh @@ -5,7 +5,7 @@ TMPDIR="docs/.observablehq/cache/" # Download the data (if it’s not already in the cache). if [ ! -f "$TMPDIR/dft-collisions.csv" ]; then - curl "$URL" -o "$TMPDIR/dft-collisions.csv" + curl -f "$URL" -o "$TMPDIR/dft-collisions.csv" fi # Generate a CSV file using DuckDB. diff --git a/docs/quakes.json.sh b/docs/quakes.json.sh index 697489a9d..1ade37fc7 100644 --- a/docs/quakes.json.sh +++ b/docs/quakes.json.sh @@ -1 +1 @@ -curl https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson +curl -f https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson diff --git a/examples/eia/src/data/eia-system-points.json.sh b/examples/eia/src/data/eia-system-points.json.sh index 012bd6de2..a37346737 100644 --- a/examples/eia/src/data/eia-system-points.json.sh +++ b/examples/eia/src/data/eia-system-points.json.sh @@ -1,4 +1,4 @@ -curl 'https://www.eia.gov/electricity/930-api//respondents/data?type\[0\]=BA&type\[1\]=BR' \ +curl -f 'https://www.eia.gov/electricity/930-api//respondents/data?type\[0\]=BA&type\[1\]=BR' \ -H 'Connection: keep-alive' \ -A 'Chrome/123.0.0.0' \ --compressed diff --git a/examples/loader-census/src/data/ca.json.sh b/examples/loader-census/src/data/ca.json.sh index 8d8c728fc..cc910c61d 100755 --- a/examples/loader-census/src/data/ca.json.sh +++ b/examples/loader-census/src/data/ca.json.sh @@ -1,6 +1,6 @@ # Download the ZIP archive from the Census Bureau (if needed). if [ ! -f src/.observablehq/cache/cb_2023_06_cousub_500k.zip ]; then - curl -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip' + curl -f -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip' fi # Unzip the ZIP archive to extract the shapefile. diff --git a/examples/loader-census/src/index.md b/examples/loader-census/src/index.md index 60b57cea6..34e998eb2 100644 --- a/examples/loader-census/src/index.md +++ b/examples/loader-census/src/index.md @@ -13,7 +13,7 @@ Next, here’s a bash script, `ca.json.sh`: ```bash # Download the ZIP archive from the Census Bureau (if needed). if [ ! -f src/.observablehq/cache/cb_2023_06_cousub_500k.zip ]; then - curl -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip' + curl -f -o src/.observablehq/cache/cb_2023_06_cousub_500k.zip 'https://www2.census.gov/geo/tiger/GENZ2023/shp/cb_2023_06_cousub_500k.zip' fi # Unzip the ZIP archive to extract the shapefile. diff --git a/examples/loader-duckdb/src/educ_uoe_lang01.parquet.sh b/examples/loader-duckdb/src/educ_uoe_lang01.parquet.sh index 51712808f..0ee625015 100644 --- a/examples/loader-duckdb/src/educ_uoe_lang01.parquet.sh +++ b/examples/loader-duckdb/src/educ_uoe_lang01.parquet.sh @@ -6,7 +6,7 @@ TMPDIR="src/.observablehq/cache/" # Download the data (if it’s not already in the cache). if [ ! -f "$TMPDIR/$CODE.csv" ]; then - curl "$URL" -o "$TMPDIR/$CODE.csv" + curl -f "$URL" -o "$TMPDIR/$CODE.csv" fi # Generate a Parquet file using DuckDB. diff --git a/examples/loader-duckdb/src/index.md b/examples/loader-duckdb/src/index.md index 3777f14fa..d97f0e9e5 100644 --- a/examples/loader-duckdb/src/index.md +++ b/examples/loader-duckdb/src/index.md @@ -11,7 +11,7 @@ TMPDIR="src/.observablehq/cache/" # Download the data (if it’s not already in the cache). if [ ! -f "$TMPDIR/$CODE.csv" ]; then - curl "$URL" -o "$TMPDIR/$CODE.csv" + curl -f "$URL" -o "$TMPDIR/$CODE.csv" fi # Generate a Parquet file using DuckDB.