From 713149fb780aff77cbf5b53994cc98d73c91ddcc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Peter=20S=CC=8Ctibrany=CC=81?= Date: Fri, 11 Sep 2020 11:29:16 +0200 Subject: [PATCH 1/4] Document blocksconvert tools. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Peter Štibraný --- .../convert-stored-chunks-to-blocks.md | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 docs/blocks-storage/convert-stored-chunks-to-blocks.md diff --git a/docs/blocks-storage/convert-stored-chunks-to-blocks.md b/docs/blocks-storage/convert-stored-chunks-to-blocks.md new file mode 100644 index 00000000000..89396115083 --- /dev/null +++ b/docs/blocks-storage/convert-stored-chunks-to-blocks.md @@ -0,0 +1,81 @@ +--- +title: "Convert data from chunks to blocks" +linkTitle: "Convert data from chunks to blocks" +weight: 5 +slug: convert-data-from-chunks-to-blocks +--- + +If you have [configured your cluster to write new data to blocks](./migrate-from-chunks-to-blocks/), there is still a question about old data. +Cortex can query both chunks and the blocks at the same time, but converting old chunks to blocks still has some benefits. +This document presents set of tools for doing the conversion. + +[Original design document](https://docs.google.com/document/d/1VI0cgaJmHD0pcrRb3UV04f8szXXGmFKQyqUJnFOcf6Q/edit?usp=sharing) for `blocksconvert is also available. + +## Tools + +Cortex now contains tool called `blocksconvert`, which is actually collection of three tools for doing conversion of chunks to blocks. + +Tools are: + +- *Scanner* scans the index database and produces so-called "plan files", each file being a set of series and chunks for each series. Plan files are uploaded to the same object store where blocks live. +- *Scheduler* looks for plan files, and distributes them to builders. Scheduler has global view of overall conversion progress. +- *Builder* asks scheduler for next plan file to work on, fetches chunks, puts them into TSDB block, and uploads the block to the object store. It repeats this process until there are no more plans. + +All tools start HTTP server (see `-server.http*` options) with `/metrics` endpoint. +All tools also start gRPC server (`-server.grpc*` options), but only Scheduler exposes services on it. + +### Scanner + +Scanner is started by running `blocksconvert -target=scanner`. Scanner requires configuration for accessing Cortex Index: + +- `-schema-config-file` – this is standard Cortex schema file +- `-bigtable.instance`, `-bigtable.project` – options for BigTable access. +- `-experimental.blocks-storage.backend` and corresponding `-experimental.blocks-storage.*` options for storing plan files +- `-scanner.local-dir` – specifies local directory for writing plan files to. Finished plan files are deleted after upload to the bucket. List of scanned tables is also kept in this directory, to avoid scanning the same tables multiple times when Scanner is restarted. +- `-scanner.allowed-users` – comma-separated list of Cortex tenants that should have plans generated. If empty, plans for all found users are generated. +- `-scanner.ignore-user` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped. +- `-scanner.tables-limit` – How many tables should be scanner? By default all tables are scanned, but when testing scanner it may be useful to start with small number of tables first. + +Scanner will read the Cortex schema file to discover Index tables, and then it will start scanning them from most-recent table first, going back. +For each table, it will fully read the table and generate plan for each user and day stored in the table. +Plan files are then uploaded to the configured blocks-storage bucket, and local copies are deleted. +After that scanner continues with the next table, until it scans them all or tables-limit is reached. + +Note that even though `blocksconvert` has options for configuring different Index store backends, **it only supports BigTable at the moment.** + +It is expected that only single Scanner process is running. +Scanner does the scanning of multiple table subranges concurrently. + +Scanner exposes metrics with `cortex_blocksconvert_scanner_` prefix, eg. total number of scanned index entries of different type, number of open files (scanner doesn't close currently plan files until entire table has been scanned), scanned BigTable rows and parsed index entries. + +**Scanner only supports schema version v9, v10 and v11. Earlier schema versions are currently not supported.** + +### Scheduler + +Scheduler is started by running `blocksconvert -target=scheduler`. It only needs to be configured with options to access the object store with blocks: + +- `-experimental.blocks-storage.*` - Blocks storage object store configuration. +- `-scheduler.scan-interval` – How often to scan for plan files and their status. +- `-scheduler.allowed-users` – Comma-separated list of Cortex tenants. If set, only plans for these tenants will be offered to Builders. + +It is expected that only single Scheduler process is running. Schedulers consume very little resources. + +Scheduler's metrics have `cortex_blocksconvert_scheduler` prefix (number of plans in different states, oldest/newest plan). +Scheduler also has `/plans` page on HTTP server that shows currently queued plans, and all plans and their status for all users. + +### Builder + +Builder asks scheduler for next plan to work on, downloads the plan, builds the block and uploads the block to the blocks storage. It then repeats the process while there are still plans. + +Builder is started by `blocksconvert -target=builder`. It needs to be configured with Scheduler endpoint, Cortex schema file, chunk-store specific options and blocks storage to upload blocks to. + +- `-builder.scheduler-endpoint` - where to find scheduler, eg. "scheduler:9095" +- `-schema-config-file` - Cortex schema file, used to find out which chunks store to use for given plan +- `-gcs.bucketname` – when using GCS as chunks store +- `-experimental.blocks-storage.*` - blocks storage configuration +- `-builder.output-dir` - Local directory where Builder keeps the block while it is being built. Once block is uploaded to blocks storage, it is deleted from local directory. + +Multiple builders may run at the same time, each builder will receive different plan to work on from scheduler. +Builders are CPU intensive (decoding and merging chunks), and require fast IO for writing chunks. + +Builders's metrics use `cortex_blocksconvert_builder` prefix, and include total number of fetched chunks and their size, read position of the current plan and plan size, total number of written series and samples, number of chunks that couldn't be downloaded. From 63ee7cd07460a8edcd11ac2563b53d284ff60998 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Peter=20S=CC=8Ctibrany=CC=81?= Date: Fri, 11 Sep 2020 11:40:10 +0200 Subject: [PATCH 2/4] Fix white space. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Peter Štibraný --- docs/blocks-storage/convert-stored-chunks-to-blocks.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/blocks-storage/convert-stored-chunks-to-blocks.md b/docs/blocks-storage/convert-stored-chunks-to-blocks.md index 89396115083..784fe9e6e97 100644 --- a/docs/blocks-storage/convert-stored-chunks-to-blocks.md +++ b/docs/blocks-storage/convert-stored-chunks-to-blocks.md @@ -33,7 +33,7 @@ Scanner is started by running `blocksconvert -target=scanner`. Scanner requires - `-experimental.blocks-storage.backend` and corresponding `-experimental.blocks-storage.*` options for storing plan files - `-scanner.local-dir` – specifies local directory for writing plan files to. Finished plan files are deleted after upload to the bucket. List of scanned tables is also kept in this directory, to avoid scanning the same tables multiple times when Scanner is restarted. - `-scanner.allowed-users` – comma-separated list of Cortex tenants that should have plans generated. If empty, plans for all found users are generated. -- `-scanner.ignore-user` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped. +- `-scanner.ignore-user` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped. - `-scanner.tables-limit` – How many tables should be scanner? By default all tables are scanned, but when testing scanner it may be useful to start with small number of tables first. Scanner will read the Cortex schema file to discover Index tables, and then it will start scanning them from most-recent table first, going back. @@ -48,7 +48,7 @@ Scanner does the scanning of multiple table subranges concurrently. Scanner exposes metrics with `cortex_blocksconvert_scanner_` prefix, eg. total number of scanned index entries of different type, number of open files (scanner doesn't close currently plan files until entire table has been scanned), scanned BigTable rows and parsed index entries. -**Scanner only supports schema version v9, v10 and v11. Earlier schema versions are currently not supported.** +**Scanner only supports schema version v9, v10 and v11. Earlier schema versions are currently not supported.** ### Scheduler From bdfd7aa2641150d56a36998973a9a73d4f9ccfdb Mon Sep 17 00:00:00 2001 From: Marco Pracucci Date: Tue, 15 Sep 2020 15:07:58 +0200 Subject: [PATCH 3/4] Small fixes to chunks conversion doc Signed-off-by: Marco Pracucci --- .../convert-stored-chunks-to-blocks.md | 35 ++++++++++--------- ...rate-storage-from-thanos-and-prometheus.md | 2 +- 2 files changed, 20 insertions(+), 17 deletions(-) diff --git a/docs/blocks-storage/convert-stored-chunks-to-blocks.md b/docs/blocks-storage/convert-stored-chunks-to-blocks.md index 784fe9e6e97..1e0f8395dd9 100644 --- a/docs/blocks-storage/convert-stored-chunks-to-blocks.md +++ b/docs/blocks-storage/convert-stored-chunks-to-blocks.md @@ -1,36 +1,39 @@ --- -title: "Convert data from chunks to blocks" -linkTitle: "Convert data from chunks to blocks" -weight: 5 -slug: convert-data-from-chunks-to-blocks +title: "Convert long-term storage from chunks to blocks" +linkTitle: "Convert long-term storage from chunks to blocks" +weight: 6 +slug: convert-long-term-storage-from-chunks-to-blocks --- -If you have [configured your cluster to write new data to blocks](./migrate-from-chunks-to-blocks/), there is still a question about old data. -Cortex can query both chunks and the blocks at the same time, but converting old chunks to blocks still has some benefits. +If you have [configured your cluster to write new data to blocks](./migrate-from-chunks-to-blocks.md), there is still a question about old data. +Cortex can query both chunks and the blocks at the same time, but converting old chunks to blocks still has some benefits, like being able to decommission the chunks storage backend and save costs. This document presents set of tools for doing the conversion. -[Original design document](https://docs.google.com/document/d/1VI0cgaJmHD0pcrRb3UV04f8szXXGmFKQyqUJnFOcf6Q/edit?usp=sharing) for `blocksconvert is also available. +_[Original design document](https://docs.google.com/document/d/1VI0cgaJmHD0pcrRb3UV04f8szXXGmFKQyqUJnFOcf6Q/edit?usp=sharing) for `blocksconvert` is also available._ ## Tools -Cortex now contains tool called `blocksconvert`, which is actually collection of three tools for doing conversion of chunks to blocks. +Cortex provides a tool called `blocksconvert`, which is actually collection of three tools for converting chunks to blocks. Tools are: -- *Scanner* scans the index database and produces so-called "plan files", each file being a set of series and chunks for each series. Plan files are uploaded to the same object store where blocks live. -- *Scheduler* looks for plan files, and distributes them to builders. Scheduler has global view of overall conversion progress. -- *Builder* asks scheduler for next plan file to work on, fetches chunks, puts them into TSDB block, and uploads the block to the object store. It repeats this process until there are no more plans. +- [**Scanner**](#scanner)
+ Scans the chunks index database and produces so-called "plan files", each file being a set of series and chunks for each series. Plan files are uploaded to the same object store bucket where blocks live. +- [**Scheduler**](#scheduler)
+ Looks for plan files, and distributes them to builders. Scheduler has global view of overall conversion progress. +- [**Builder**](#builder)
+ Asks scheduler for next plan file to work on, fetches chunks, puts them into TSDB block, and uploads the block to the object store. It repeats this process until there are no more plans. -All tools start HTTP server (see `-server.http*` options) with `/metrics` endpoint. +All tools start HTTP server (see `-server.http*` options) exposing the `/metrics` endpoint. All tools also start gRPC server (`-server.grpc*` options), but only Scheduler exposes services on it. ### Scanner Scanner is started by running `blocksconvert -target=scanner`. Scanner requires configuration for accessing Cortex Index: -- `-schema-config-file` – this is standard Cortex schema file +- `-schema-config-file` – this is standard Cortex schema file. - `-bigtable.instance`, `-bigtable.project` – options for BigTable access. -- `-experimental.blocks-storage.backend` and corresponding `-experimental.blocks-storage.*` options for storing plan files +- `-blocks-storage.backend` and corresponding `-blocks-storage.*` options for storing plan files. - `-scanner.local-dir` – specifies local directory for writing plan files to. Finished plan files are deleted after upload to the bucket. List of scanned tables is also kept in this directory, to avoid scanning the same tables multiple times when Scanner is restarted. - `-scanner.allowed-users` – comma-separated list of Cortex tenants that should have plans generated. If empty, plans for all found users are generated. - `-scanner.ignore-user` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped. @@ -54,7 +57,7 @@ Scanner exposes metrics with `cortex_blocksconvert_scanner_` prefix, eg. total n Scheduler is started by running `blocksconvert -target=scheduler`. It only needs to be configured with options to access the object store with blocks: -- `-experimental.blocks-storage.*` - Blocks storage object store configuration. +- `-blocks-storage.*` - Blocks storage object store configuration. - `-scheduler.scan-interval` – How often to scan for plan files and their status. - `-scheduler.allowed-users` – Comma-separated list of Cortex tenants. If set, only plans for these tenants will be offered to Builders. @@ -72,7 +75,7 @@ Builder is started by `blocksconvert -target=builder`. It needs to be configured - `-builder.scheduler-endpoint` - where to find scheduler, eg. "scheduler:9095" - `-schema-config-file` - Cortex schema file, used to find out which chunks store to use for given plan - `-gcs.bucketname` – when using GCS as chunks store -- `-experimental.blocks-storage.*` - blocks storage configuration +- `-blocks-storage.*` - blocks storage configuration - `-builder.output-dir` - Local directory where Builder keeps the block while it is being built. Once block is uploaded to blocks storage, it is deleted from local directory. Multiple builders may run at the same time, each builder will receive different plan to work on from scheduler. diff --git a/docs/blocks-storage/migrate-storage-from-thanos-and-prometheus.md b/docs/blocks-storage/migrate-storage-from-thanos-and-prometheus.md index 0f6e2d27d68..65c7ecb12e2 100644 --- a/docs/blocks-storage/migrate-storage-from-thanos-and-prometheus.md +++ b/docs/blocks-storage/migrate-storage-from-thanos-and-prometheus.md @@ -1,7 +1,7 @@ --- title: "Migrate the storage from Thanos and Prometheus" linkTitle: "Migrate the storage from Thanos and Prometheus" -weight: 5 +weight: 7 slug: migrate-storage-from-thanos-and-prometheus --- From a6d291d197c91344b7f7728eb5e73e06bc70249b Mon Sep 17 00:00:00 2001 From: Marco Pracucci Date: Tue, 15 Sep 2020 15:18:49 +0200 Subject: [PATCH 4/4] More small fixes to blocksconvert doc Signed-off-by: Marco Pracucci --- .../convert-stored-chunks-to-blocks.md | 27 ++++++++++++------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/docs/blocks-storage/convert-stored-chunks-to-blocks.md b/docs/blocks-storage/convert-stored-chunks-to-blocks.md index 1e0f8395dd9..88ab405420e 100644 --- a/docs/blocks-storage/convert-stored-chunks-to-blocks.md +++ b/docs/blocks-storage/convert-stored-chunks-to-blocks.md @@ -34,15 +34,15 @@ Scanner is started by running `blocksconvert -target=scanner`. Scanner requires - `-schema-config-file` – this is standard Cortex schema file. - `-bigtable.instance`, `-bigtable.project` – options for BigTable access. - `-blocks-storage.backend` and corresponding `-blocks-storage.*` options for storing plan files. -- `-scanner.local-dir` – specifies local directory for writing plan files to. Finished plan files are deleted after upload to the bucket. List of scanned tables is also kept in this directory, to avoid scanning the same tables multiple times when Scanner is restarted. +- `-scanner.output-dir` – specifies local directory for writing plan files to. Finished plan files are deleted after upload to the bucket. List of scanned tables is also kept in this directory, to avoid scanning the same tables multiple times when Scanner is restarted. - `-scanner.allowed-users` – comma-separated list of Cortex tenants that should have plans generated. If empty, plans for all found users are generated. -- `-scanner.ignore-user` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped. -- `-scanner.tables-limit` – How many tables should be scanner? By default all tables are scanned, but when testing scanner it may be useful to start with small number of tables first. +- `-scanner.ignore-users-regex` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped. +- `-scanner.tables-limit` – How many tables should be scanned? By default all tables are scanned, but when testing scanner it may be useful to start with small number of tables first. Scanner will read the Cortex schema file to discover Index tables, and then it will start scanning them from most-recent table first, going back. -For each table, it will fully read the table and generate plan for each user and day stored in the table. -Plan files are then uploaded to the configured blocks-storage bucket, and local copies are deleted. -After that scanner continues with the next table, until it scans them all or tables-limit is reached. +For each table, it will fully read the table and generate a plan for each user and day stored in the table. +Plan files are then uploaded to the configured blocks-storage bucket (at the `-blocksconvert.bucket-prefix` location prefix), and local copies are deleted. +After that, scanner continues with the next table until it scans them all or `-scanner.tables-limit` is reached. Note that even though `blocksconvert` has options for configuring different Index store backends, **it only supports BigTable at the moment.** @@ -64,7 +64,7 @@ Scheduler is started by running `blocksconvert -target=scheduler`. It only needs It is expected that only single Scheduler process is running. Schedulers consume very little resources. Scheduler's metrics have `cortex_blocksconvert_scheduler` prefix (number of plans in different states, oldest/newest plan). -Scheduler also has `/plans` page on HTTP server that shows currently queued plans, and all plans and their status for all users. +Scheduler HTTP server also exposes `/plans` page that shows currently queued plans, and all plans and their status for all users. ### Builder @@ -74,11 +74,18 @@ Builder is started by `blocksconvert -target=builder`. It needs to be configured - `-builder.scheduler-endpoint` - where to find scheduler, eg. "scheduler:9095" - `-schema-config-file` - Cortex schema file, used to find out which chunks store to use for given plan -- `-gcs.bucketname` – when using GCS as chunks store +- `-gcs.bucketname` – when using GCS as chunks store (other chunks backend storages, like S3, are supported as well) - `-blocks-storage.*` - blocks storage configuration - `-builder.output-dir` - Local directory where Builder keeps the block while it is being built. Once block is uploaded to blocks storage, it is deleted from local directory. Multiple builders may run at the same time, each builder will receive different plan to work on from scheduler. -Builders are CPU intensive (decoding and merging chunks), and require fast IO for writing chunks. +Builders are CPU intensive (decoding and merging chunks), and require fast disk IO for writing blocks. -Builders's metrics use `cortex_blocksconvert_builder` prefix, and include total number of fetched chunks and their size, read position of the current plan and plan size, total number of written series and samples, number of chunks that couldn't be downloaded. +Builders's metrics have `cortex_blocksconvert_builder` prefix, and include total number of fetched chunks and their size, read position of the current plan and plan size, total number of written series and samples, number of chunks that couldn't be downloaded. + +### Limitations + +The `blocksconvert` toolset currently has the following limitations: + +- Supports only BigTable for chunks index backend +- Supports only chunks schema versions v9, v10 and v11