Skip to content
This repository was archived by the owner on Feb 21, 2024. It is now read-only.

Commit daceee2

Browse files
CLoZengineerseebstravisturnerPranitha-malaepaddyjok
authored
merge: featurebase merge updates for 2022-10-28 (#2188)
* use t.Fatal(f) to abort tests, not panic * make perf_able run at all, make it debug a bit better switch perf-able to using same node type we use for other spot instances, because otherwise it never finds any available capacity. we switch the perf-able script to use the standard get_value function instead of direct jq calls. we try to grab server logs if the restore fails in the hopes of finding out why the restore very occasionally fails. * Fix some issues with running IDK tests in docker. (#2248) *Stop running TestKafkaSourceIntegration with t.Parallel() This test can't be run in parallel as it's currently written. Doing so allows for interleaving of messages to the same kafka topic between tests. I didn't attempt to modify the test so it could be run in parallel. That could be done, but left for someone more ambitious. * Remove idk/testenv/certs which got accidentally committed. also update .gitignore to include those. * changes to add bool support in idk (#2240) * initial changes to add bool support in idk * modifying some default parameters for testing, will revert them later * adding support for bool in making fragments function * boolean values implementation without supporting empty or null values at this point * Implement bool support in batch using a map (and a slice for nulls) (#2247) * Implement bool support in batch using a map (and a slice for nulls) * Keep the PackBools default for now But set it explicity in the ingest tests which rely on it. * Modify batch to construct bool update like mutex The code in API.ImportRoaringShard has a switch statement which causes bool fields to be handled like mutex fields. This means, that the viewUpdate.Clear value should only contain data in the first "row" of the fragment, which it will treat as records to clear for *all* rows. This makes more sense for mutex fields; for bool fields, there's only one other row to clear. But since the code is currently handling them the same, we need to construct viewUpdate.Clear such that it conforms to that pattern. This commit also adds a test which covers this logic. * Remove commented code; revert config for testing This commit also removes the DELETE_SENTINEL case for non-packed bools, since that isn't supported anyway. * Revert default setting * remove inconsistent type scope * correcting the logic of string converstion to bool * resolving an error in a test * adding tests to cover code related to bool support in batch.go file and interface.go files * modifying interfaces test * added one more test case Co-authored-by: Travis Turner <[email protected]> Co-authored-by: Travis Turner <[email protected]> * resolving bool null field ingestion error (#2254) * resolving bool null field ingestion error * testing issues * adding null support for bools * updating the null bool field ingestion * trying to resolve issue when ingesting null value for bool type * adding a clearing support for bool type * resolving issues with bool null value ingestion * updating the jwt go package version and removing changes made in docker compose file * reverting jwt go version * removing v4 of jwt * adding a comment in test file to see if sonar cloud accepts this file * don't obtain stack traces on rbf.Tx creation We thought stack traces were mildly expensive. We were very wrong. Due to a complicated issue in the Go runtime, simultaneous requests for stack traces end up contending on a lock even when they're not actually contending on any resources. I've filed a ticket in the Go issue tracker for this: golang/go#56400 In the mean time: Under some workloads, we were seeing 85% of all CPU time go into the stack backtraces, of which 81% went into the contention on those locks. But even if you take away the contention, that leaves us with 4/19 of all CPU time in our code going into building those stack backtraces. That's a lot of overhead for a feature we virtually never use. We might consider adding a backtrace functionality here, possibly using `runtime.Callers` which is much lower overhead, and allows us to generate a backtrace on demand (no argument values available, but then, we never read those because they're unformatted hex values), but I don't think it's actually very informative to know what the stack traces were of the Tx; they don't necessarily reflect the current state of any ongoing use of the Tx, so we can't necessarily correlate them to goroutine stack dumps, and so on. * fb-1729 Enriched Table Metadata (#2255) enriched metadata for tables added support for the concept of a table and field owners in metadata; mechanism to derive owner from http request metadata; metadata for table description * tightened up is/is not null filter expressions (FB-1741) (#2260) Covers tightening up handling filter expressions that contain is/is not null ops. These filters may have to be translated into PQL calls to be passed to the executor and even though sql3 language supports nullability for any data type, currently only BSI fields are nullable at the storage engine level (there is a ticket to add support for non-BSI field here FB-1689: IS SQL Argument returns incorrect error) so when these fields are used in filter conditions we need to handle BSI and non-BSI fields differently. * added a test to cover the keyword replace as being synonymous with insert (#2261) * update molecula references to featurebase (#2262) Co-authored-by: Seebs <[email protected]> Co-authored-by: Travis Turner <[email protected]> Co-authored-by: Pranitha-malae <[email protected]> Co-authored-by: Travis Turner <[email protected]> Co-authored-by: pokeeffe-molecula <[email protected]> Co-authored-by: Stephanie Yang <[email protected]>
1 parent fc1a5da commit daceee2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+2970
-1737
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@ builds/
2424
*.tfstate.backup
2525
.vscode
2626

27+
idk/testdata/idk*.out
28+
idk/testenv/certs/*
29+
2730
# copy of .gitignore from archived idk repo
2831
# Compiled Object files, Static and Dynamic libs (Shared Objects)
2932
*.o

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ FROM golang:${GO_VERSION} as pilosa-builder
2222
ARG MAKE_FLAGS
2323
WORKDIR /pilosa
2424

25-
RUN go get github.com/rakyll/statik
25+
RUN go install github.com/rakyll/statik@v0.1.7
2626

2727
COPY . ./
2828
COPY --from=lattice-builder /lattice/build /lattice

Makefile

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -205,12 +205,8 @@ generate-pql: require-peg
205205

206206
generate-proto-grpc: require-protoc require-protoc-gen-go
207207
protoc -I proto proto/pilosa.proto --go_out=plugins=grpc:proto
208-
protoc -I proto proto/vdsm/vdsm.proto --go_out=plugins=grpc:proto
209-
# TODO: Modify above commands and remove the below mv if possible.
210-
# See https://go-review.googlesource.com/c/protobuf/+/219298/ for info on --go-opt
211-
# I couldn't get it to work during development - Cody
212-
cp -r proto/github.com/featurebasedb/featurebase/v3/proto/ proto/
213-
rm -rf proto/github.com
208+
# address re-generation here only if we need to
209+
# protoc -I proto proto/vdsm.proto --go_out=plugins=grpc:proto
214210

215211
# `go generate` all needed packages
216212
generate: generate-protoc generate-statik generate-stringer generate-pql

api.go

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -224,14 +224,23 @@ func (api *API) CreateIndex(ctx context.Context, indexName string, options Index
224224
span, _ := tracing.StartSpanFromContext(ctx, "API.CreateIndex")
225225
defer span.Finish()
226226

227+
// get the requestUserID from the context -- assumes the http handler has populated this from
228+
// authN/Z info
229+
requestUserID, ok := ctx.Value(ContextRequestUserIdKey).(string)
230+
if !ok {
231+
requestUserID = ""
232+
}
233+
227234
if err := api.validate(apiCreateIndex); err != nil {
228235
return nil, errors.Wrap(err, "validating api method")
229236
}
230237

231238
// Populate the create index message.
239+
ts := timestamp()
232240
cim := &CreateIndexMessage{
233241
Index: indexName,
234-
CreatedAt: timestamp(),
242+
CreatedAt: ts,
243+
Owner: requestUserID,
235244
Meta: options,
236245
}
237246

@@ -307,6 +316,13 @@ func (api *API) CreateField(ctx context.Context, indexName string, fieldName str
307316
return nil, errors.Wrap(err, "validating api method")
308317
}
309318

319+
// get the requestUserID from the context -- assumes the http handler has populated this from
320+
// authN/Z info
321+
requestUserID, ok := ctx.Value(ContextRequestUserIdKey).(string)
322+
if !ok {
323+
requestUserID = ""
324+
}
325+
310326
// Apply and validate functional options.
311327
fo, err := newFieldOptions(opts...)
312328
if err != nil {
@@ -324,11 +340,12 @@ func (api *API) CreateField(ctx context.Context, indexName string, fieldName str
324340
Index: indexName,
325341
Field: fieldName,
326342
CreatedAt: timestamp(),
343+
Owner: requestUserID,
327344
Meta: fo,
328345
}
329346

330347
// Create field.
331-
field, err := index.CreateField(fieldName, opts...)
348+
field, err := index.CreateField(fieldName, requestUserID, opts...)
332349
if err != nil {
333350
return nil, errors.Wrap(err, "creating field")
334351
}
@@ -358,7 +375,11 @@ func (api *API) UpdateField(ctx context.Context, indexName, fieldName string, up
358375
return newNotFoundError(ErrIndexNotFound, indexName)
359376
}
360377

361-
cfm, err := index.UpdateField(ctx, fieldName, update)
378+
// get the requestUserID from the context -- assumes the http handler has populated this from
379+
// authN/Z info
380+
requestUserID, _ := ctx.Value(ContextRequestUserIdKey).(string)
381+
382+
cfm, err := index.UpdateField(ctx, fieldName, requestUserID, update)
362383
if err != nil {
363384
return errors.Wrap(err, "updating field")
364385
}

api_test.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1388,11 +1388,11 @@ func TestVariousApiTranslateCalls(t *testing.T) {
13881388
// this should never actually get used because we're testing for errors here
13891389
r := strings.NewReader("")
13901390
// test index
1391-
idx, err := api.Holder().CreateIndex(c.Idx(), pilosa.IndexOptions{})
1391+
idx, err := api.Holder().CreateIndex(c.Idx(), "", pilosa.IndexOptions{})
13921392
if err != nil {
13931393
t.Fatalf("%v: could not create test index", err)
13941394
}
1395-
if _, err = idx.CreateFieldIfNotExistsWithOptions("field", &pilosa.FieldOptions{Keys: false}); err != nil {
1395+
if _, err = idx.CreateFieldIfNotExistsWithOptions("field", "", &pilosa.FieldOptions{Keys: false}); err != nil {
13961396
t.Fatalf("creating field: %v", err)
13971397
}
13981398
t.Run("translateIndexDbOnNilIndex",

client/batch.go

Lines changed: 119 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ type agedTranslation struct {
8787
// | uint64 | set | any |
8888
// | int64 | int | any |
8989
// | float64| decimal | scale |
90+
// | bool | bool | any |
9091
// | nil | any | |
9192
//
9293
// nil values are ignored.
@@ -124,6 +125,15 @@ type Batch struct {
124125
// values holds the values for each record of an int field
125126
values map[string][]int64
126127

128+
// boolValues is a map[fieldName][idsIndex]bool, which holds the values for
129+
// each record of a bool field. It is a map of maps in order to accomodate
130+
// nil values (they just aren't recorded in the map[int]).
131+
boolValues map[string]map[int]bool
132+
133+
// boolNulls holds a slice of indices into b.ids for each bool field which
134+
// has nil values.
135+
boolNulls map[string][]uint64
136+
127137
// times holds a time for each record. (if any of the fields are time fields)
128138
times []QuantizedTime
129139

@@ -233,6 +243,8 @@ func NewBatch(client *Client, size int, index *Index, fields []*Field, opts ...B
233243
headerMap := make(map[string]*Field, len(fields))
234244
rowIDs := make(map[int][]uint64, len(fields))
235245
values := make(map[string][]int64)
246+
boolValues := make(map[string]map[int]bool)
247+
boolNulls := make(map[string][]uint64)
236248
tt := make(map[int]map[string][]int, len(fields))
237249
ttSets := make(map[string]map[string][]int)
238250
hasTime := false
@@ -257,6 +269,8 @@ func NewBatch(client *Client, size int, index *Index, fields []*Field, opts ...B
257269
tt[i] = make(map[string][]int)
258270
}
259271
rowIDs[i] = make([]uint64, 0, size)
272+
case FieldTypeBool:
273+
boolValues[field.Name()] = make(map[int]bool)
260274
default:
261275
return nil, errors.Errorf("field type '%s' is not currently supported through Batch", typ)
262276
}
@@ -273,6 +287,8 @@ func NewBatch(client *Client, size int, index *Index, fields []*Field, opts ...B
273287
clearRowIDs: make(map[int]map[int]uint64),
274288
rowIDSets: make(map[string][][]uint64),
275289
values: values,
290+
boolValues: boolValues,
291+
boolNulls: boolNulls,
276292
nullIndices: make(map[string][]uint64),
277293
toTranslate: tt,
278294
toTranslateClear: make(map[int]map[string][]int),
@@ -480,28 +496,8 @@ func (b *Batch) Add(rec Row) error {
480496
field := b.header[i]
481497
switch val := rec.Values[i].(type) {
482498
case string:
483-
if field.Opts().Type() != FieldTypeInt {
484-
// nil-extend
485-
for len(b.rowIDs[i]) < curPos {
486-
b.rowIDs[i] = append(b.rowIDs[i], nilSentinel)
487-
}
488-
rowIDs := b.rowIDs[i]
489-
// empty string is not a valid value at this point (Pilosa refuses to translate it)
490-
if val == "" { //
491-
b.rowIDs[i] = append(rowIDs, nilSentinel)
492-
493-
} else if rowID, ok := b.getRowTranslation(field.Name(), val); ok {
494-
b.rowIDs[i] = append(rowIDs, rowID)
495-
} else {
496-
ints, ok := b.toTranslate[i][val]
497-
if !ok {
498-
ints = make([]int, 0)
499-
}
500-
ints = append(ints, curPos)
501-
b.toTranslate[i][val] = ints
502-
b.rowIDs[i] = append(rowIDs, 0)
503-
}
504-
} else if field.Opts().Type() == FieldTypeInt {
499+
switch field.Opts().Type() {
500+
case FieldTypeInt:
505501
if val == "" {
506502
// copied from the `case nil:` section for ints and decimals
507503
b.values[field.Name()] = append(b.values[field.Name()], 0)
@@ -522,6 +518,30 @@ func (b *Batch) Add(rec Row) error {
522518
b.toTranslate[i][val] = ints
523519
b.values[field.Name()] = append(b.values[field.Name()], 0)
524520
}
521+
case FieldTypeBool:
522+
// If we want to support bools as string values, we would do
523+
// that here.
524+
default:
525+
// nil-extend
526+
for len(b.rowIDs[i]) < curPos {
527+
b.rowIDs[i] = append(b.rowIDs[i], nilSentinel)
528+
}
529+
rowIDs := b.rowIDs[i]
530+
// empty string is not a valid value at this point (Pilosa refuses to translate it)
531+
if val == "" { //
532+
b.rowIDs[i] = append(rowIDs, nilSentinel)
533+
534+
} else if rowID, ok := b.getRowTranslation(field.Name(), val); ok {
535+
b.rowIDs[i] = append(rowIDs, rowID)
536+
} else {
537+
ints, ok := b.toTranslate[i][val]
538+
if !ok {
539+
ints = make([]int, 0)
540+
}
541+
ints = append(ints, curPos)
542+
b.toTranslate[i][val] = ints
543+
b.rowIDs[i] = append(rowIDs, 0)
544+
}
525545
}
526546
case uint64:
527547
// nil-extend
@@ -579,8 +599,8 @@ func (b *Batch) Add(rec Row) error {
579599
}
580600
b.rowIDSets[field.Name()] = append(rowIDSets, val)
581601
case nil:
582-
t := field.Opts().Type()
583-
if t == FieldTypeInt || t == FieldTypeDecimal || t == FieldTypeTimestamp {
602+
switch field.Opts().Type() {
603+
case FieldTypeInt, FieldTypeDecimal, FieldTypeTimestamp:
584604
b.values[field.Name()] = append(b.values[field.Name()], 0)
585605
nullIndices, ok := b.nullIndices[field.Name()]
586606
if !ok {
@@ -589,7 +609,15 @@ func (b *Batch) Add(rec Row) error {
589609
nullIndices = append(nullIndices, uint64(curPos))
590610
b.nullIndices[field.Name()] = nullIndices
591611

592-
} else {
612+
case FieldTypeBool:
613+
boolNulls, ok := b.boolNulls[field.Name()]
614+
if !ok {
615+
boolNulls = make([]uint64, 0)
616+
}
617+
boolNulls = append(boolNulls, uint64(curPos))
618+
b.boolNulls[field.Name()] = boolNulls
619+
620+
default:
593621
// only append nil to rowIDs if this field already has
594622
// rowIDs. Otherwise, this could be a []string or
595623
// []uint64 field where we've only seen nil values so
@@ -600,6 +628,10 @@ func (b *Batch) Add(rec Row) error {
600628
b.rowIDs[i] = append(rowIDs, nilSentinel)
601629
}
602630
}
631+
632+
case bool:
633+
b.boolValues[field.Name()][curPos] = val
634+
603635
default:
604636
return errors.Errorf("Val %v Type %[1]T is not currently supported. Use string, uint64 (row id), or int64 (integer value)", val)
605637
}
@@ -637,7 +669,6 @@ func (b *Batch) Add(rec Row) error {
637669
}
638670
b.rowIDs[i][len(b.rowIDs[i])-1] = clearSentinel
639671
}
640-
641672
default:
642673
return errors.Errorf("Clearing a value '%v' Type %[1]T is not currently supported (field '%s')", val, field.Name())
643674
}
@@ -712,7 +743,6 @@ func (b *Batch) Import() error {
712743
return errors.Wrap(err, "making fragments (flush)")
713744
}
714745
if b.useShardTransactionalEndpoint {
715-
// TODO handle bool?
716746
frags, clearFrags, err = b.makeSingleValFragments(frags, clearFrags)
717747
if err != nil {
718748
return errors.Wrap(err, "making single val fragments")
@@ -1478,6 +1508,60 @@ func (b *Batch) makeSingleValFragments(frags, clearFrags fragments) (fragments,
14781508
}
14791509
}
14801510
}
1511+
// -------------------------
1512+
// Boolean fields
1513+
// -------------------------
1514+
falseRowOffset := 0 * shardWidth // fragment row 0
1515+
trueRowOffset := 1 * shardWidth // fragment row 1
1516+
1517+
// For bools which have been set to null, clear both the true and false
1518+
// values for the record. Because this ends up going through the
1519+
// API.ImportRoaringShard() method (which handles `bool` fields the same as
1520+
// `mutex` fields), we don't actually set the true and false rows of the
1521+
// boolean fragment; rather, we just set the first row to indicate which
1522+
// records (for all rows) to clear.
1523+
for fieldname, boolNulls := range b.boolNulls {
1524+
field := b.headerMap[fieldname]
1525+
if field.Opts().Type() != featurebase.FieldTypeBool {
1526+
continue
1527+
}
1528+
for _, pos := range boolNulls {
1529+
recID := b.ids[pos]
1530+
shard := recID / shardWidth
1531+
clearBM := clearFrags.GetOrCreate(shard, field.Name(), "standard")
1532+
1533+
fragmentColumn := recID % shardWidth
1534+
clearBM.Add(fragmentColumn)
1535+
}
1536+
}
1537+
1538+
// For bools which have been set to a non-nil value, set the appropriate
1539+
// value for the record, and unset the opposing values. For example, if the
1540+
// bool is set to `false`, then set the bit in the "false" row, and clear
1541+
// the bit in the "true" row.
1542+
for fieldname, boolMap := range b.boolValues {
1543+
field := b.headerMap[fieldname]
1544+
if field.Opts().Type() != featurebase.FieldTypeBool {
1545+
continue
1546+
}
1547+
1548+
for pos, boolVal := range boolMap {
1549+
recID := b.ids[pos]
1550+
1551+
shard := recID / shardWidth
1552+
bitmap := frags.GetOrCreate(shard, field.Name(), "standard")
1553+
clearBM := clearFrags.GetOrCreate(shard, field.Name(), "standard")
1554+
1555+
fragmentColumn := recID % shardWidth
1556+
clearBM.Add(fragmentColumn)
1557+
1558+
if boolVal {
1559+
bitmap.Add(trueRowOffset + fragmentColumn)
1560+
} else {
1561+
bitmap.Add(falseRowOffset + fragmentColumn)
1562+
}
1563+
}
1564+
}
14811565

14821566
return frags, clearFrags, nil
14831567
}
@@ -1700,6 +1784,14 @@ func (b *Batch) reset() {
17001784
delete(clearMap, k)
17011785
}
17021786
}
1787+
for _, boolsMap := range b.boolValues {
1788+
for k := range boolsMap {
1789+
delete(boolsMap, k)
1790+
}
1791+
}
1792+
for k := range b.boolNulls {
1793+
delete(b.boolNulls, k) // TODO pool these slices
1794+
}
17031795
for i := range b.toTranslateID {
17041796
b.toTranslateID[i] = ""
17051797
}

0 commit comments

Comments
 (0)