Skip to content

Commit dcfb336

Browse files
authored
Merge branch 'master' into spread-flushes
2 parents 73f0a5c + 52fdbc6 commit dcfb336

File tree

1,775 files changed

+25226
-861952
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,775 files changed

+25226
-861952
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
name: Bug report
3+
about: Create a report to help us improve
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
**Describe the bug**
11+
A clear and concise description of what the bug is.
12+
13+
**To Reproduce**
14+
Steps to reproduce the behavior:
15+
1. Start Cortex (SHA or version)
16+
2. Perform Operations(Read/Write/Others)
17+
18+
**Expected behavior**
19+
A clear and concise description of what you expected to happen.
20+
21+
**Environment:**
22+
- Infrastructure: [e.g., Kubernetes, bare-metal, laptop]
23+
- Deployment tool: [e.g., helm, jsonnet]
24+
25+
**Storage Engine**
26+
- [ ] Blocks
27+
- [ ] Chunks
28+
29+
**Additional Context**
30+
<!-- Additional relevant info which can help us debug this issue easily like Logs, Configuration etc. -->
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
---
2+
name: Feature request
3+
about: Suggest an idea for this project
4+
title: ''
5+
labels: ''
6+
assignees: ''
7+
8+
---
9+
10+
**Is your feature request related to a problem? Please describe.**
11+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12+
13+
**Describe the solution you'd like**
14+
A clear and concise description of what you want to happen.
15+
16+
**Describe alternatives you've considered**
17+
A clear and concise description of any alternative solutions or features you've considered.
18+
19+
**Additional context**
20+
Add any other context or screenshots about the feature request here.

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,9 @@ website/content/en/docs
1212
e2e_integration_test*
1313
active-query-tracker
1414
dist/
15+
16+
# Binaries built from ./cmd
17+
blocksconvert
18+
cortex
19+
query-tee
20+
test-exporter

CHANGELOG.md

Lines changed: 58 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,72 @@
22

33
## master / unreleased
44

5+
* [CHANGE] Improved shuffle sharding support in the write path. This work introduced some config changes: #3090
6+
* Introduced `-distributor.sharding-strategy` CLI flag (and its respective `sharding_strategy` YAML config option) to explicitly specify which sharding strategy should be used in the write path
7+
* `-experimental.distributor.user-subring-size` flag renamed to `-distributor.ingestion-tenant-shard-size`
8+
* `user_subring_size` limit YAML config option renamed to `ingestion_tenant_shard_size`
9+
* [CHANGE] Dropped "blank Alertmanager configuration; using fallback" message from Info to Debug level. #3205
10+
* [CHANGE] Zone-awareness replication for time-series now should be explicitly enabled in the distributor via the `-distributor.zone-awareness-enabled` CLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200
11+
* [CHANGE] Removed the deprecated CLI flag `-config-yaml`. You should use `-schema-config-file` instead. #3225
12+
* [CHANGE] Enforced the HTTP method required by some API endpoints which did (incorrectly) allow any method before that. #3228
13+
- `GET /`
14+
- `GET /config`
15+
- `GET /debug/fgprof`
16+
- `GET /distributor/all_user_stats`
17+
- `GET /distributor/ha_tracker`
18+
- `GET /all_user_stats`
19+
- `GET /ha-tracker`
20+
- `GET /api/v1/user_stats`
21+
- `GET /api/v1/chunks`
22+
- `GET <legacy-http-prefix>/user_stats`
23+
- `GET <legacy-http-prefix>/chunks`
24+
- `GET /services`
25+
- `GET /multitenant_alertmanager/status`
26+
- `GET /status` (alertmanager microservice)
27+
- `GET|POST /ingester/ring`
28+
- `GET|POST /ring`
29+
- `GET|POST /store-gateway/ring`
30+
- `GET|POST /compactor/ring`
31+
- `GET|POST /ingester/flush`
32+
- `GET|POST /ingester/shutdown`
33+
- `GET|POST /flush`
34+
- `GET|POST /shutdown`
35+
- `GET|POST /ruler/ring`
36+
- `POST /api/v1/push`
37+
- `POST <legacy-http-prefix>/push`
38+
- `POST /push`
39+
- `POST /ingester/push`
40+
* [FEATURE] Added support for shuffle-sharding queriers in the query-frontend. When configured (`-frontend.max-queriers-per-user` globally, or using per-user limit `max_queriers_per_user`), each user's requests will be handled by different set of queriers. #3113
41+
* [FEATURE] Query-frontend: added `compression` config to support results cache with compression. #3217
42+
* [ENHANCEMENT] Added `cortex_query_frontend_connected_clients` metric to show the number of workers currently connected to the frontend. #3207
43+
* [ENHANCEMENT] Shuffle sharding: improved shuffle sharding in the write path. Shuffle sharding now should be explicitly enabled via `-distributor.sharding-strategy` CLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090 #3214
44+
* [ENHANCEMENT] Ingester: added new metric `cortex_ingester_active_series` to track active series more accurately. Also added options to control whether active series tracking is enabled (`-ingester.active-series-enabled`, defaults to false), and how often this metric is updated (`-ingester.active-series-update-period`) and max idle time for series to be considered inactive (`-ingester.active-series-idle-timeout`). #3153
45+
* [ENHANCEMENT] Blocksconvert – Builder: download plan file locally before processing it. #3209
46+
* [ENHANCEMENT] Store-gateway: added zone-aware replication support to blocks replication in the store-gateway. #3200
47+
* [ENHANCEMENT] Store-gateway: exported new metrics. #3231
48+
- `cortex_bucket_store_cached_series_fetch_duration_seconds`
49+
- `cortex_bucket_store_cached_postings_fetch_duration_seconds`
50+
- `cortex_bucket_stores_gate_queries_max`
51+
* [ENHANCEMENT] Added `-version` flag to Cortex. #3233
552
* [ENHANCEMENT] Smooth out spikes in rate of chunk flush operations. #3191
53+
* [BUGFIX] No-longer-needed ingester operations for queries triggered by queriers and rulers are now canceled. #3178
54+
* [BUGFIX] Ruler: directories in the configured `rules-path` will be removed on startup and shutdown in order to ensure they don't persist between runs. #3195
55+
* [BUGFIX] Handle hash-collisions in the query path. #3192
56+
* [BUGFIX] Check for postgres rows errors. #3197
57+
* [BUGFIX] Ruler Experimental API: Don't allow rule groups without names or empty rule groups. #3210
58+
* [BUGFIX] Experimental Alertmanager API: Do not allow empty Alertmanager configurations or bad template filenames to be submitted through the configuration API. #3185
659

760
## 1.4.0-rc.0 in progress
861

62+
* [CHANGE] TLS configuration for gRPC, HTTP and etcd clients is now marked as experimental. These features are not yet fully baked, and we expect possible small breaking changes in Cortex 1.5. #3198
963
* [CHANGE] Cassandra backend support is now GA (stable). #3180
10-
* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180
64+
* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180 #3201
1165
- `-experimental.blocks-storage.*` flags renamed to `-blocks-storage.*`
1266
- `-experimental.store-gateway.*` flags renamed to `-store-gateway.*`
1367
- `-experimental.querier.store-gateway-client.*` flags renamed to `-querier.store-gateway-client.*`
1468
- `-experimental.querier.store-gateway-addresses` flag renamed to `-querier.store-gateway-addresses`
69+
- `-store-gateway.replication-factor` flag renamed to `-store-gateway.sharding-ring.replication-factor`
70+
- `-store-gateway.tokens-file-path` flag renamed to `store-gateway.sharding-ring.tokens-file-path`
1571
* [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running `v1.0` or below, it is recommended to first upgrade to `v1.1`/`v1.2`/`v1.3` and run it for a day before upgrading to `v1.4` to avoid data loss. #3115
1672
* [CHANGE] Distributor API endpoints are no longer served unless target is set to `distributor` or `all`. #3112
1773
* [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
@@ -51,7 +107,7 @@
51107
* [ENHANCEMENT] Add "integration" as a label for `cortex_alertmanager_notifications_total` and `cortex_alertmanager_notifications_failed_total` metrics. #3056
52108
* [ENHANCEMENT] Add `cortex_ruler_config_last_reload_successful` and `cortex_ruler_config_last_reload_successful_seconds` to check status of users rule manager. #3056
53109
* [ENHANCEMENT] The configuration validation now fails if an empty YAML node has been set for a root YAML config property. #3080
54-
* [ENHANCEMENT] Memcached dial() calls now have an optional circuit-breaker to avoid hammering a broken cache #3051
110+
* [ENHANCEMENT] Memcached dial() calls now have a circuit-breaker to avoid hammering a broken cache. #3051, #3189
55111
* [ENHANCEMENT] `-ruler.evaluation-delay-duration` is now overridable as a per-tenant limit, `ruler_evaluation_delay_duration`. #3098
56112
* [ENHANCEMENT] Add TLS support to etcd client. #3102
57113
* [ENHANCEMENT] When a tenant accesses the Alertmanager UI or its API, if we have valid `-alertmanager.configs.fallback` we'll use that to start the manager and avoid failing the request. #3073

cmd/cortex/main.go

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,8 @@ func main() {
6161
eventSampleRate int
6262
ballastBytes int
6363
mutexProfileFraction int
64+
printVersion bool
65+
printModules bool
6466
)
6567

6668
configFile, expandENV := parseConfigFileParameter(os.Args[1:])
@@ -86,6 +88,8 @@ func main() {
8688
flag.IntVar(&eventSampleRate, "event.sample-rate", 0, "How often to sample observability events (0 = never).")
8789
flag.IntVar(&ballastBytes, "mem-ballast-size-bytes", 0, "Size of memory ballast to allocate.")
8890
flag.IntVar(&mutexProfileFraction, "debug.mutex-profile-fraction", 0, "Fraction at which mutex profile vents will be reported, 0 to disable")
91+
flag.BoolVar(&printVersion, "version", false, "Print Cortex version and exit.")
92+
flag.BoolVar(&printModules, "modules", false, "List available values that can be used as target.")
8993

9094
usage := flag.CommandLine.Usage
9195
flag.CommandLine.Usage = func() { /* don't do anything by default, we will print usage ourselves, but only when requested. */ }
@@ -106,6 +110,11 @@ func main() {
106110
}
107111
}
108112

113+
if printVersion {
114+
fmt.Fprintln(os.Stdout, version.Print("Cortex"))
115+
return
116+
}
117+
109118
// Validate the config once both the config file has been loaded
110119
// and CLI flags parsed.
111120
err = cfg.Validate(util.Logger)
@@ -118,7 +127,7 @@ func main() {
118127

119128
// Continue on if -modules flag is given. Code handling the
120129
// -modules flag will not start cortex.
121-
if testMode && !cfg.ListModules {
130+
if testMode && !printModules {
122131
DumpYaml(&cfg)
123132
return
124133
}
@@ -151,7 +160,7 @@ func main() {
151160
t, err := cortex.New(cfg)
152161
util.CheckFatal("initializing cortex", err)
153162

154-
if t.Cfg.ListModules {
163+
if printModules {
155164
allDeps := t.ModuleManager.DependenciesForModule(cortex.All)
156165

157166
for _, m := range t.ModuleManager.UserVisibleModuleNames() {
@@ -167,12 +176,7 @@ func main() {
167176

168177
fmt.Fprintln(os.Stdout)
169178
fmt.Fprintln(os.Stdout, "Modules marked with * are included in target All.")
170-
171-
// in test mode we cannot call os.Exit, it will stop to whole test process.
172-
if testMode {
173-
return
174-
}
175-
os.Exit(2)
179+
return
176180
}
177181

178182
level.Info(util.Logger).Log("msg", "Starting Cortex", "version", version.Info())

cmd/cortex/main_test.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,11 @@ func TestFlagParsing(t *testing.T) {
8080
stderrMessage: "the Querier configuration in YAML has been specified as an empty YAML node",
8181
},
8282

83+
"version": {
84+
arguments: []string{"-version"},
85+
stdoutMessage: "Cortex, version",
86+
},
87+
8388
// we cannot test the happy path, as cortex would then fully start
8489
} {
8590
t.Run(name, func(t *testing.T) {

docs/blocks-storage/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ The supported backends for the blocks storage are:
1414
* [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/)
1515
* [Local Filesystem](https://thanos.io/storage.md/#filesystem) (single node only)
1616

17-
_Internally, this storage engine is based on [Thanos](https://thanos.io), but no Thanos knowledge is required in order to run it._
17+
_Internally, some components are based on [Thanos](https://thanos.io), but no Thanos knowledge is required in order to run it._
1818

1919
## Architecture
2020

@@ -30,7 +30,7 @@ The **[store-gateway](./store-gateway.md)** is responsible to query blocks and i
3030

3131
The **[compactor](./compactor.md)** is responsible to merge and deduplicate smaller blocks into larger ones, in order to reduce the number of blocks stored in the long-term storage for a given tenant and query them more efficiently. The compactor is optional but highly recommended.
3232

33-
Finally, the **table-manager** is not used by the blocks storage.
33+
Finally, the **table-manager** and the [**schema**](../configuration/schema-config-reference.md) configuration are **not used** by the blocks storage.
3434

3535
### The write path
3636

docs/blocks-storage/convert-stored-chunks-to-blocks.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Scanner is started by running `blocksconvert -target=scanner`. Scanner requires
3838
- `-scanner.allowed-users` – comma-separated list of Cortex tenants that should have plans generated. If empty, plans for all found users are generated.
3939
- `-scanner.ignore-users-regex` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped.
4040
- `-scanner.tables-limit` – How many tables should be scanned? By default all tables are scanned, but when testing scanner it may be useful to start with small number of tables first.
41+
- `-scanner.tables` – Comma-separated list of tables to be scanned. Can be used to scan specific tables only. Note that schema is still used to find all tables first, and then this list is consulted to select only specified tables.
4142

4243
Scanner will read the Cortex schema file to discover Index tables, and then it will start scanning them from most-recent table first, going back.
4344
For each table, it will fully read the table and generate a plan for each user and day stored in the table.

docs/blocks-storage/querier.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet
3030

3131
Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.
3232

33-
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
33+
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
3434

3535
If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.
3636

docs/blocks-storage/querier.template

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet
3030

3131
Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.
3232

33-
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
33+
The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
3434

3535
If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.
3636

0 commit comments

Comments
 (0)