cortexproject
diff --git a/‎.github/ISSUE_TEMPLATE/bug_report.md‎
Lines changed: 30 additions & 0 deletions b/‎.github/ISSUE_TEMPLATE/bug_report.md‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎.github/ISSUE_TEMPLATE/feature_request.md‎
Lines changed: 20 additions & 0 deletions b/‎.github/ISSUE_TEMPLATE/feature_request.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 0 deletions b/‎.gitignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 58 additions & 2 deletions b/‎CHANGELOG.md‎
Lines changed: 58 additions & 2 deletions
diff --git a/‎cmd/cortex/main.go‎
Lines changed: 12 additions & 8 deletions b/‎cmd/cortex/main.go‎
Lines changed: 12 additions & 8 deletions
diff --git a/‎cmd/cortex/main_test.go‎
Lines changed: 5 additions & 0 deletions b/‎cmd/cortex/main_test.go‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/blocks-storage/_index.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/blocks-storage/_index.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/blocks-storage/convert-stored-chunks-to-blocks.md‎
Lines changed: 1 addition & 0 deletions b/‎docs/blocks-storage/convert-stored-chunks-to-blocks.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/blocks-storage/querier.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/blocks-storage/querier.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/blocks-storage/querier.template‎
Lines changed: 1 addition & 1 deletion b/‎docs/blocks-storage/querier.template‎
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,30 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Start Cortex (SHA or version)
+2. Perform Operations(Read/Write/Others)
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Environment:**
+ - Infrastructure: [e.g., Kubernetes, bare-metal, laptop]
+ - Deployment tool: [e.g., helm, jsonnet]
+
+**Storage Engine**
+- [ ] Blocks
+- [ ] Chunks
+
+**Additional Context**
+<!--  Additional relevant info which can help us debug this issue easily like Logs, Configuration etc. -->
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
@@ -12,3 +12,9 @@ website/content/en/docs
 e2e_integration_test*
 active-query-tracker
 dist/
+
+# Binaries built from ./cmd
+blocksconvert
+cortex
+query-tee
+test-exporter
@@ -2,16 +2,72 @@
 
 ## master / unreleased
 
+* [CHANGE] Improved shuffle sharding support in the write path. This work introduced some config changes: #3090
+  * Introduced `-distributor.sharding-strategy` CLI flag (and its respective `sharding_strategy` YAML config option) to explicitly specify which sharding strategy should be used in the write path
+  * `-experimental.distributor.user-subring-size` flag renamed to `-distributor.ingestion-tenant-shard-size`
+  * `user_subring_size` limit YAML config option renamed to `ingestion_tenant_shard_size`
+* [CHANGE] Dropped "blank Alertmanager configuration; using fallback" message from Info to Debug level. #3205
+* [CHANGE] Zone-awareness replication for time-series now should be explicitly enabled in the distributor via the `-distributor.zone-awareness-enabled` CLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200
+* [CHANGE] Removed the deprecated CLI flag `-config-yaml`. You should use `-schema-config-file` instead. #3225
+* [CHANGE] Enforced the HTTP method required by some API endpoints which did (incorrectly) allow any method before that. #3228
+  - `GET /`
+  - `GET /config`
+  - `GET /debug/fgprof`
+  - `GET /distributor/all_user_stats`
+  - `GET /distributor/ha_tracker`
+  - `GET /all_user_stats`
+  - `GET /ha-tracker`
+  - `GET /api/v1/user_stats`
+  - `GET /api/v1/chunks`
+  - `GET <legacy-http-prefix>/user_stats`
+  - `GET <legacy-http-prefix>/chunks`
+  - `GET /services`
+  - `GET /multitenant_alertmanager/status`
+  - `GET /status` (alertmanager microservice)
+  - `GET|POST /ingester/ring`
+  - `GET|POST /ring`
+  - `GET|POST /store-gateway/ring`
+  - `GET|POST /compactor/ring`
+  - `GET|POST /ingester/flush`
+  - `GET|POST /ingester/shutdown`
+  - `GET|POST /flush`
+  - `GET|POST /shutdown`
+  - `GET|POST /ruler/ring`
+  - `POST /api/v1/push`
+  - `POST <legacy-http-prefix>/push`
+  - `POST /push`
+  - `POST /ingester/push`
+* [FEATURE] Added support for shuffle-sharding queriers in the query-frontend. When configured (`-frontend.max-queriers-per-user` globally, or using per-user limit `max_queriers_per_user`), each user's requests will be handled by different set of queriers. #3113
+* [FEATURE] Query-frontend: added `compression` config to support results cache with compression. #3217
+* [ENHANCEMENT] Added `cortex_query_frontend_connected_clients` metric to show the number of workers currently connected to the frontend. #3207
+* [ENHANCEMENT] Shuffle sharding: improved shuffle sharding in the write path. Shuffle sharding now should be explicitly enabled via `-distributor.sharding-strategy` CLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090 #3214
+* [ENHANCEMENT] Ingester: added new metric `cortex_ingester_active_series` to track active series more accurately. Also added options to control whether active series tracking is enabled (`-ingester.active-series-enabled`, defaults to false), and how often this metric is updated (`-ingester.active-series-update-period`) and max idle time for series to be considered inactive (`-ingester.active-series-idle-timeout`). #3153
+* [ENHANCEMENT] Blocksconvert – Builder: download plan file locally before processing it. #3209
+* [ENHANCEMENT] Store-gateway: added zone-aware replication support to blocks replication in the store-gateway. #3200
+* [ENHANCEMENT] Store-gateway: exported new metrics. #3231
+  - `cortex_bucket_store_cached_series_fetch_duration_seconds`
+  - `cortex_bucket_store_cached_postings_fetch_duration_seconds`
+  - `cortex_bucket_stores_gate_queries_max`
+* [ENHANCEMENT] Added `-version` flag to Cortex. #3233
 * [ENHANCEMENT] Smooth out spikes in rate of chunk flush operations. #3191
+* [BUGFIX] No-longer-needed ingester operations for queries triggered by queriers and rulers are now canceled. #3178
+* [BUGFIX] Ruler: directories in the configured `rules-path` will be removed on startup and shutdown in order to ensure they don't persist between runs. #3195
+* [BUGFIX] Handle hash-collisions in the query path. #3192
+* [BUGFIX] Check for postgres rows errors. #3197
+* [BUGFIX] Ruler Experimental API: Don't allow rule groups without names or empty rule groups. #3210
+* [BUGFIX] Experimental Alertmanager API: Do not allow empty Alertmanager configurations or bad template filenames to be submitted through the configuration API. #3185
 
 ## 1.4.0-rc.0 in progress
 
+* [CHANGE] TLS configuration for gRPC, HTTP and etcd clients is now marked as experimental. These features are not yet fully baked, and we expect possible small breaking changes in Cortex 1.5. #3198
 * [CHANGE] Cassandra backend support is now GA (stable). #3180
-* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180
+* [CHANGE] Blocks storage is now GA (stable). The `-experimental` prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180 #3201
   - `-experimental.blocks-storage.*` flags renamed to `-blocks-storage.*`
   - `-experimental.store-gateway.*` flags renamed to `-store-gateway.*`
   - `-experimental.querier.store-gateway-client.*` flags renamed to `-querier.store-gateway-client.*`
   - `-experimental.querier.store-gateway-addresses` flag renamed to `-querier.store-gateway-addresses`
+  - `-store-gateway.replication-factor` flag renamed to `-store-gateway.sharding-ring.replication-factor`
+  - `-store-gateway.tokens-file-path` flag renamed to `store-gateway.sharding-ring.tokens-file-path`
 * [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running `v1.0` or below, it is recommended to first upgrade to `v1.1`/`v1.2`/`v1.3` and run it for a day before upgrading to `v1.4` to avoid data loss. #3115
 * [CHANGE] Distributor API endpoints are no longer served unless target is set to `distributor` or `all`. #3112
 * [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
@@ -51,7 +107,7 @@
 * [ENHANCEMENT] Add "integration" as a label for `cortex_alertmanager_notifications_total` and `cortex_alertmanager_notifications_failed_total` metrics. #3056
 * [ENHANCEMENT] Add `cortex_ruler_config_last_reload_successful` and `cortex_ruler_config_last_reload_successful_seconds` to check status of users rule manager. #3056
 * [ENHANCEMENT] The configuration validation now fails if an empty YAML node has been set for a root YAML config property. #3080
-* [ENHANCEMENT] Memcached dial() calls now have an optional circuit-breaker to avoid hammering a broken cache #3051
+* [ENHANCEMENT] Memcached dial() calls now have a circuit-breaker to avoid hammering a broken cache. #3051, #3189
 * [ENHANCEMENT] `-ruler.evaluation-delay-duration` is now overridable as a per-tenant limit, `ruler_evaluation_delay_duration`. #3098
 * [ENHANCEMENT] Add TLS support to etcd client. #3102
 * [ENHANCEMENT] When a tenant accesses the Alertmanager UI or its API, if we have valid `-alertmanager.configs.fallback` we'll use that to start the manager and avoid failing the request. #3073
 
@@ -61,6 +61,8 @@ func main() {
 		eventSampleRate      int
 		ballastBytes         int
 		mutexProfileFraction int
+		printVersion         bool
+		printModules         bool
 	)
 
 	configFile, expandENV := parseConfigFileParameter(os.Args[1:])
@@ -86,6 +88,8 @@ func main() {
 	flag.IntVar(&eventSampleRate, "event.sample-rate", 0, "How often to sample observability events (0 = never).")
 	flag.IntVar(&ballastBytes, "mem-ballast-size-bytes", 0, "Size of memory ballast to allocate.")
 	flag.IntVar(&mutexProfileFraction, "debug.mutex-profile-fraction", 0, "Fraction at which mutex profile vents will be reported, 0 to disable")
+	flag.BoolVar(&printVersion, "version", false, "Print Cortex version and exit.")
+	flag.BoolVar(&printModules, "modules", false, "List available values that can be used as target.")
 
 	usage := flag.CommandLine.Usage
 	flag.CommandLine.Usage = func() { /* don't do anything by default, we will print usage ourselves, but only when requested. */ }
@@ -106,6 +110,11 @@ func main() {
 		}
 	}
 
+	if printVersion {
+		fmt.Fprintln(os.Stdout, version.Print("Cortex"))
+		return
+	}
+
 	// Validate the config once both the config file has been loaded
 	// and CLI flags parsed.
 	err = cfg.Validate(util.Logger)
@@ -118,7 +127,7 @@ func main() {
 
 	// Continue on if -modules flag is given. Code handling the
 	// -modules flag will not start cortex.
-	if testMode && !cfg.ListModules {
+	if testMode && !printModules {
 		DumpYaml(&cfg)
 		return
 	}
@@ -151,7 +160,7 @@ func main() {
 	t, err := cortex.New(cfg)
 	util.CheckFatal("initializing cortex", err)
 
-	if t.Cfg.ListModules {
+	if printModules {
 		allDeps := t.ModuleManager.DependenciesForModule(cortex.All)
 
 		for _, m := range t.ModuleManager.UserVisibleModuleNames() {
@@ -167,12 +176,7 @@ func main() {
 
 		fmt.Fprintln(os.Stdout)
 		fmt.Fprintln(os.Stdout, "Modules marked with * are included in target All.")
-
-		// in test mode we cannot call os.Exit, it will stop to whole test process.
-		if testMode {
-			return
-		}
-		os.Exit(2)
+		return
 	}
 
 	level.Info(util.Logger).Log("msg", "Starting Cortex", "version", version.Info())
 
@@ -80,6 +80,11 @@ func TestFlagParsing(t *testing.T) {
 			stderrMessage: "the Querier configuration in YAML has been specified as an empty YAML node",
 		},
 
+		"version": {
+			arguments:     []string{"-version"},
+			stdoutMessage: "Cortex, version",
+		},
+
 		// we cannot test the happy path, as cortex would then fully start
 	} {
 		t.Run(name, func(t *testing.T) {
 
@@ -14,7 +14,7 @@ The supported backends for the blocks storage are:
 * [Microsoft Azure Storage](https://azure.microsoft.com/en-us/services/storage/)
 * [Local Filesystem](https://thanos.io/storage.md/#filesystem) (single node only)
 
-_Internally, this storage engine is based on [Thanos](https://thanos.io), but no Thanos knowledge is required in order to run it._
+_Internally, some components are based on [Thanos](https://thanos.io), but no Thanos knowledge is required in order to run it._
 
 ## Architecture
 
@@ -30,7 +30,7 @@ The **[store-gateway](./store-gateway.md)** is responsible to query blocks and i
 
 The **[compactor](./compactor.md)** is responsible to merge and deduplicate smaller blocks into larger ones, in order to reduce the number of blocks stored in the long-term storage for a given tenant and query them more efficiently. The compactor is optional but highly recommended.
 
-Finally, the **table-manager** is not used by the blocks storage.
+Finally, the **table-manager** and the [**schema**](../configuration/schema-config-reference.md) configuration are **not used** by the blocks storage.
 
 ### The write path
 
 
@@ -38,6 +38,7 @@ Scanner is started by running `blocksconvert -target=scanner`. Scanner requires
 - `-scanner.allowed-users` – comma-separated list of Cortex tenants that should have plans generated. If empty, plans for all found users are generated.
 - `-scanner.ignore-users-regex` - If plans for all users are generated (`-scanner.allowed-users` is not set), then users matching this non-empty regular expression will be skipped.
 - `-scanner.tables-limit` – How many tables should be scanned? By default all tables are scanned, but when testing scanner it may be useful to start with small number of tables first.
+- `-scanner.tables` – Comma-separated list of tables to be scanned. Can be used to scan specific tables only. Note that schema is still used to find all tables first, and then this list is consulted to select only specified tables.
 
 Scanner will read the Cortex schema file to discover Index tables, and then it will start scanning them from most-recent table first, going back.
 For each table, it will fully read the table and generate a plan for each user and day stored in the table.
 
@@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet
 
 Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.
 
-The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
+The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
 
 If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.
 
 
@@ -30,7 +30,7 @@ When a querier receives a query range request, it contains the following paramet
 
 Given a query, the querier analyzes the `start` and `end` time range to compute a list of all known blocks containing at least 1 sample within this time range. Given the list of blocks, the querier then computes a list of store-gateway instances holding these blocks and sends a request to each matching store-gateway instance asking to fetch all the samples for the series matching the `query` within the `start` and `end` time range.
 
-The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
+The request sent to each store-gateway contains the list of block IDs that are expected to be queried, and the response sent back by the store-gateway to the querier contains the list of block IDs that were actually queried. This list may be a subset of the requested blocks, for example due to recent blocks resharding event (ie. last few seconds). The querier runs a consistency check on responses received from the store-gateways to ensure all expected blocks have been queried; if not, the querier retries to fetch samples from missing blocks from different store-gateways (if the `-store-gateway.sharding-ring.replication-factor` is greater than `1`) and if the consistency check fails after all retries, the query execution fails as well (correctness is always guaranteed).
 
 If the query time range covers a period within `-querier.query-ingesters-within` duration, the querier also sends the request to all ingesters, in order to fetch samples that have not been uploaded to the long-term storage yet.