Skip to content

Commit 1a9d546

Browse files
delete series api and purger to purge data requested for deletion (#2103)
* delete series api and purger to purge data requested for deletion delete_store manages delete requests and purge plan records in stores purger builds delete plans(delete requests sharded by day) and executes them paralelly only one requests per user would be in execution phase at a time delete requests gets picked up for deletion after they get older by more than a day Signed-off-by: Sandeep Sukhani <[email protected]> * moved delete store creation from initStore to initPurger, which is the only component that needs it Signed-off-by: Sandeep Sukhani <[email protected]> * implemented new methods in MockStorage for writing tests Signed-off-by: Sandeep Sukhani <[email protected]> * removed DeleteClient and using IndexClient in DeleteStore Signed-off-by: Sandeep Sukhani <[email protected]> * refactored some code, added some tests for chunk store and purger Signed-off-by: Sandeep Sukhani <[email protected]> * add some tests and fixed some issues found during tests Signed-off-by: Sandeep Sukhani <[email protected]> * changes suggested in PR review Signed-off-by: Sandeep Sukhani <[email protected]> * rebased and fixed conflicts Signed-off-by: Sandeep Sukhani <[email protected]> * updated route for delete handler to look same as prometheus Signed-off-by: Sandeep Sukhani <[email protected]> * added test for purger restarts and fixed some issues Signed-off-by: Sandeep Sukhani <[email protected]> * suggested changes from PR review and fixed linter, tests Signed-off-by: Sandeep Sukhani <[email protected]> * fixed panic in modules when stopping purger Signed-off-by: Sandeep Sukhani <[email protected]> * changes suggested from PR review Signed-off-by: Sandeep Sukhani <[email protected]> * config changes suggested in PR review Signed-off-by: Sandeep Sukhani <[email protected]> * changes suggested from PR review Signed-off-by: Sandeep Sukhani <[email protected]> * updated config doc Signed-off-by: Sandeep Sukhani <[email protected]> * updated changelog Signed-off-by: Sandeep Sukhani <[email protected]> * some changes suggested from PR review Signed-off-by: Sandeep Sukhani <[email protected]> * made init in Purger public to call it from modules to fail early Signed-off-by: Sandeep Sukhani <[email protected]>
1 parent de30605 commit 1a9d546

16 files changed

+2924
-31
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
* [FEATURE] Add /config HTTP endpoint which exposes the current Cortex configuration as YAML. #2165
3737
* [FEATURE] Allow Prometheus remote write directly to ingesters. #1491
3838
* [FEATURE] Add flag `-experimental.tsdb.stripe-size` to expose TSDB stripe size option. #2185
39+
* [FEATURE] Experimental Delete Series: Added support for Deleting Series with Prometheus style API. Needs to be enabled first by setting `--purger.enable` to `true`. Deletion only supported when using `boltdb` and `filesystem` as index and object store respectively. Support for other stores to follow in separate PRs #2103
3940
* [ENHANCEMENT] Alertmanager: Expose Per-tenant alertmanager metrics #2124
4041
* [ENHANCEMENT] Add `status` label to `cortex_alertmanager_configs` metric to gauge the number of valid and invalid configs. #2125
4142
* [ENHANCEMENT] Cassandra Authentication: added the `custom_authenticators` config option that allows users to authenticate with cassandra clusters using password authenticators that are not approved by default in [gocql](https://github.com/gocql/gocql/blob/81b8263d9fe526782a588ef94d3fa5c6148e5d67/conn.go#L27) #2093

docs/configuration/config-file-reference.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,9 @@ Where default_value is the value to use if the environment variable is undefined
112112
# storage.
113113
[compactor: <compactor_config>]
114114

115+
# The purger_config configures the purger which takes care of delete requests
116+
[purger: <purger_config>]
117+
115118
# The ruler_config configures the Cortex ruler.
116119
[ruler: <ruler_config>]
117120

@@ -1523,6 +1526,15 @@ index_queries_cache_config:
15231526
# The fifo_cache_config configures the local in-memory cache.
15241527
# The CLI flags prefix for this block config is: store.index-cache-read
15251528
[fifocache: <fifo_cache_config>]
1529+
1530+
delete_store:
1531+
# Store for keeping delete request
1532+
# CLI flag: -deletes.store
1533+
[store: <string> | default = ""]
1534+
1535+
# Name of the table which stores delete requests
1536+
# CLI flag: -deletes.requests-table-name
1537+
[requests_table_name: <string> | default = "delete_requests"]
15261538
```
15271539
15281540
### `chunk_store_config`
@@ -2337,3 +2349,22 @@ sharding_ring:
23372349
# CLI flag: -compactor.ring.heartbeat-timeout
23382350
[heartbeat_timeout: <duration> | default = 1m0s]
23392351
```
2352+
2353+
### `purger_config`
2354+
2355+
The `purger_config` configures the purger which takes care of delete requests
2356+
2357+
```yaml
2358+
# Enable purger to allow deletion of series. Be aware that Delete series feature
2359+
# is still experimental
2360+
# CLI flag: -purger.enable
2361+
[enable: <boolean> | default = false]
2362+
2363+
# Number of workers executing delete plans in parallel
2364+
# CLI flag: -purger.num-workers
2365+
[num_workers: <int> | default = 2]
2366+
2367+
# Name of the object store to use for storing delete plans
2368+
# CLI flag: -purger.object-store-type
2369+
[object_store_type: <string> | default = ""]
2370+
```

docs/configuration/single-process-config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,9 @@ storage:
6666

6767
filesystem:
6868
directory: /tmp/cortex/chunks
69+
70+
delete_store:
71+
store: boltdb
72+
73+
purger:
74+
object_store_type: filesystem

pkg/chunk/aws/dynamodb_storage_client_test.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ package aws
33
import (
44
"context"
55
"testing"
6+
"time"
67

78
"github.com/prometheus/common/model"
89

@@ -29,7 +30,7 @@ func TestChunksPartialError(t *testing.T) {
2930
}
3031
ctx := context.Background()
3132
// Create more chunks than we can read in one batch
32-
_, chunks, err := testutils.CreateChunks(0, dynamoDBMaxReadBatchSize+50, model.Now())
33+
_, chunks, err := testutils.CreateChunks(0, dynamoDBMaxReadBatchSize+50, model.Now().Add(-time.Hour), model.Now())
3334
require.NoError(t, err)
3435
err = client.PutChunks(ctx, chunks)
3536
require.NoError(t, err)

pkg/chunk/delete_requests_store.go

Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
package chunk
2+
3+
import (
4+
"context"
5+
"encoding/binary"
6+
"encoding/hex"
7+
"errors"
8+
"flag"
9+
"fmt"
10+
"hash/fnv"
11+
"strconv"
12+
"strings"
13+
"time"
14+
15+
"github.com/prometheus/common/model"
16+
"github.com/prometheus/prometheus/pkg/labels"
17+
)
18+
19+
type DeleteRequestStatus string
20+
21+
const (
22+
StatusReceived DeleteRequestStatus = "received"
23+
StatusBuildingPlan DeleteRequestStatus = "buildingPlan"
24+
StatusDeleting DeleteRequestStatus = "deleting"
25+
StatusProcessed DeleteRequestStatus = "processed"
26+
27+
separator = "\000" // separator for series selectors in delete requests
28+
)
29+
30+
var (
31+
pendingDeleteRequestStatuses = []DeleteRequestStatus{StatusReceived, StatusBuildingPlan, StatusDeleting}
32+
33+
ErrDeleteRequestNotFound = errors.New("could not find matching delete request")
34+
)
35+
36+
// DeleteRequest holds all the details about a delete request
37+
type DeleteRequest struct {
38+
RequestID string `json:"request_id"`
39+
UserID string `json:"-"`
40+
StartTime model.Time `json:"start_time"`
41+
EndTime model.Time `json:"end_time"`
42+
Selectors []string `json:"selectors"`
43+
Status DeleteRequestStatus `json:"status"`
44+
Matchers [][]*labels.Matcher `json:"-"`
45+
CreatedAt model.Time `json:"created_at"`
46+
}
47+
48+
// DeleteStore provides all the methods required to manage lifecycle of delete request and things related to it
49+
type DeleteStore struct {
50+
cfg DeleteStoreConfig
51+
indexClient IndexClient
52+
}
53+
54+
// DeleteStoreConfig holds configuration for delete store
55+
type DeleteStoreConfig struct {
56+
Store string `yaml:"store"`
57+
RequestsTableName string `yaml:"requests_table_name"`
58+
}
59+
60+
// RegisterFlags adds the flags required to configure this flag set.
61+
func (cfg *DeleteStoreConfig) RegisterFlags(f *flag.FlagSet) {
62+
f.StringVar(&cfg.Store, "deletes.store", "", "Store for keeping delete request")
63+
f.StringVar(&cfg.RequestsTableName, "deletes.requests-table-name", "delete_requests", "Name of the table which stores delete requests")
64+
}
65+
66+
// NewDeleteStore creates a store for managing delete requests
67+
func NewDeleteStore(cfg DeleteStoreConfig, indexClient IndexClient) (*DeleteStore, error) {
68+
ds := DeleteStore{
69+
cfg: cfg,
70+
indexClient: indexClient,
71+
}
72+
73+
return &ds, nil
74+
}
75+
76+
// Add creates entries for a new delete request
77+
func (ds *DeleteStore) AddDeleteRequest(ctx context.Context, userID string, startTime, endTime model.Time, selectors []string) error {
78+
requestID := generateUniqueID(userID, selectors)
79+
80+
for {
81+
_, err := ds.GetDeleteRequest(ctx, userID, string(requestID))
82+
if err != nil {
83+
if err == ErrDeleteRequestNotFound {
84+
break
85+
}
86+
return err
87+
}
88+
89+
// we have a collision here, lets recreate a new requestID and check for collision
90+
time.Sleep(time.Millisecond)
91+
requestID = generateUniqueID(userID, selectors)
92+
}
93+
94+
// userID, requestID
95+
userIDAndRequestID := fmt.Sprintf("%s:%s", userID, requestID)
96+
97+
// Add an entry with userID, requestID as range key and status as value to make it easy to manage and lookup status
98+
// We don't want to set anything in hash key here since we would want to find delete requests by just status
99+
writeBatch := ds.indexClient.NewWriteBatch()
100+
writeBatch.Add(ds.cfg.RequestsTableName, "", []byte(userIDAndRequestID), []byte(StatusReceived))
101+
102+
// Add another entry with additional details like creation time, time range of delete request and selectors in value
103+
rangeValue := fmt.Sprintf("%x:%x:%x", int64(model.Now()), int64(startTime), int64(endTime))
104+
writeBatch.Add(ds.cfg.RequestsTableName, userIDAndRequestID, []byte(rangeValue), []byte(strings.Join(selectors, separator)))
105+
106+
return ds.indexClient.BatchWrite(ctx, writeBatch)
107+
}
108+
109+
// GetDeleteRequestsByStatus returns all delete requests for given status
110+
func (ds *DeleteStore) GetDeleteRequestsByStatus(ctx context.Context, status DeleteRequestStatus) ([]DeleteRequest, error) {
111+
return ds.queryDeleteRequests(ctx, []IndexQuery{{TableName: ds.cfg.RequestsTableName, ValueEqual: []byte(status)}})
112+
}
113+
114+
// GetDeleteRequestsForUserByStatus returns all delete requests for a user with given status
115+
func (ds *DeleteStore) GetDeleteRequestsForUserByStatus(ctx context.Context, userID string, status DeleteRequestStatus) ([]DeleteRequest, error) {
116+
return ds.queryDeleteRequests(ctx, []IndexQuery{
117+
{TableName: ds.cfg.RequestsTableName, RangeValuePrefix: []byte(userID), ValueEqual: []byte(status)},
118+
})
119+
}
120+
121+
// GetAllDeleteRequestsForUser returns all delete requests for a user
122+
func (ds *DeleteStore) GetAllDeleteRequestsForUser(ctx context.Context, userID string) ([]DeleteRequest, error) {
123+
return ds.queryDeleteRequests(ctx, []IndexQuery{
124+
{TableName: ds.cfg.RequestsTableName, RangeValuePrefix: []byte(userID)},
125+
})
126+
}
127+
128+
// UpdateStatus updates status of a delete request
129+
func (ds *DeleteStore) UpdateStatus(ctx context.Context, userID, requestID string, newStatus DeleteRequestStatus) error {
130+
userIDAndRequestID := fmt.Sprintf("%s:%s", userID, requestID)
131+
132+
writeBatch := ds.indexClient.NewWriteBatch()
133+
writeBatch.Add(ds.cfg.RequestsTableName, "", []byte(userIDAndRequestID), []byte(newStatus))
134+
135+
return ds.indexClient.BatchWrite(ctx, writeBatch)
136+
}
137+
138+
// GetDeleteRequest returns delete request with given requestID
139+
func (ds *DeleteStore) GetDeleteRequest(ctx context.Context, userID, requestID string) (*DeleteRequest, error) {
140+
userIDAndRequestID := fmt.Sprintf("%s:%s", userID, requestID)
141+
142+
deleteRequests, err := ds.queryDeleteRequests(ctx, []IndexQuery{
143+
{TableName: ds.cfg.RequestsTableName, RangeValuePrefix: []byte(userIDAndRequestID)},
144+
})
145+
146+
if err != nil {
147+
return nil, err
148+
}
149+
150+
if len(deleteRequests) == 0 {
151+
return nil, ErrDeleteRequestNotFound
152+
}
153+
154+
return &deleteRequests[0], nil
155+
}
156+
157+
// GetPendingDeleteRequestsForUser returns all delete requests for a user which are not processed
158+
func (ds *DeleteStore) GetPendingDeleteRequestsForUser(ctx context.Context, userID string) ([]DeleteRequest, error) {
159+
pendingDeleteRequests := []DeleteRequest{}
160+
for _, status := range pendingDeleteRequestStatuses {
161+
deleteRequests, err := ds.GetDeleteRequestsForUserByStatus(ctx, userID, status)
162+
if err != nil {
163+
return nil, err
164+
}
165+
166+
pendingDeleteRequests = append(pendingDeleteRequests, deleteRequests...)
167+
}
168+
169+
return pendingDeleteRequests, nil
170+
}
171+
172+
func (ds *DeleteStore) queryDeleteRequests(ctx context.Context, deleteQuery []IndexQuery) ([]DeleteRequest, error) {
173+
deleteRequests := []DeleteRequest{}
174+
err := ds.indexClient.QueryPages(ctx, deleteQuery, func(query IndexQuery, batch ReadBatch) (shouldContinue bool) {
175+
itr := batch.Iterator()
176+
for itr.Next() {
177+
userID, requestID := splitUserIDAndRequestID(string(itr.RangeValue()))
178+
179+
deleteRequests = append(deleteRequests, DeleteRequest{
180+
UserID: userID,
181+
RequestID: requestID,
182+
Status: DeleteRequestStatus(itr.Value()),
183+
})
184+
}
185+
return true
186+
})
187+
if err != nil {
188+
return nil, err
189+
}
190+
191+
for i, deleteRequest := range deleteRequests {
192+
deleteRequestQuery := []IndexQuery{{TableName: ds.cfg.RequestsTableName, HashValue: fmt.Sprintf("%s:%s", deleteRequest.UserID, deleteRequest.RequestID)}}
193+
194+
var parseError error
195+
err := ds.indexClient.QueryPages(ctx, deleteRequestQuery, func(query IndexQuery, batch ReadBatch) (shouldContinue bool) {
196+
itr := batch.Iterator()
197+
itr.Next()
198+
199+
deleteRequest, err = parseDeleteRequestTimestamps(itr.RangeValue(), deleteRequest)
200+
if err != nil {
201+
parseError = err
202+
return false
203+
}
204+
205+
deleteRequest.Selectors = strings.Split(string(itr.Value()), separator)
206+
deleteRequests[i] = deleteRequest
207+
208+
return true
209+
})
210+
211+
if err != nil {
212+
return nil, err
213+
}
214+
215+
if parseError != nil {
216+
return nil, parseError
217+
}
218+
}
219+
220+
return deleteRequests, nil
221+
}
222+
223+
func parseDeleteRequestTimestamps(rangeValue []byte, deleteRequest DeleteRequest) (DeleteRequest, error) {
224+
hexParts := strings.Split(string(rangeValue), ":")
225+
if len(hexParts) != 3 {
226+
return deleteRequest, errors.New("invalid key in parsing delete request lookup response")
227+
}
228+
229+
createdAt, err := strconv.ParseInt(hexParts[0], 16, 64)
230+
if err != nil {
231+
return deleteRequest, err
232+
}
233+
234+
from, err := strconv.ParseInt(hexParts[1], 16, 64)
235+
if err != nil {
236+
return deleteRequest, err
237+
238+
}
239+
through, err := strconv.ParseInt(hexParts[2], 16, 64)
240+
if err != nil {
241+
return deleteRequest, err
242+
243+
}
244+
245+
deleteRequest.CreatedAt = model.Time(createdAt)
246+
deleteRequest.StartTime = model.Time(from)
247+
deleteRequest.EndTime = model.Time(through)
248+
249+
return deleteRequest, nil
250+
}
251+
252+
// An id is useful in managing delete requests
253+
func generateUniqueID(orgID string, selectors []string) []byte {
254+
uniqueID := fnv.New32()
255+
_, _ = uniqueID.Write([]byte(orgID))
256+
257+
timeNow := make([]byte, 8)
258+
binary.LittleEndian.PutUint64(timeNow, uint64(time.Now().UnixNano()))
259+
_, _ = uniqueID.Write(timeNow)
260+
261+
for _, selector := range selectors {
262+
_, _ = uniqueID.Write([]byte(selector))
263+
}
264+
265+
return encodeUniqueID(uniqueID.Sum32())
266+
}
267+
268+
func encodeUniqueID(t uint32) []byte {
269+
throughBytes := make([]byte, 4)
270+
binary.BigEndian.PutUint32(throughBytes, t)
271+
encodedThroughBytes := make([]byte, 8)
272+
hex.Encode(encodedThroughBytes, throughBytes)
273+
return encodedThroughBytes
274+
}
275+
276+
func splitUserIDAndRequestID(rangeValue string) (userID, requestID string) {
277+
lastIndex := strings.LastIndex(rangeValue, ":")
278+
279+
userID = rangeValue[:lastIndex]
280+
requestID = rangeValue[lastIndex+1:]
281+
282+
return
283+
}

0 commit comments

Comments
 (0)