Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
6f4d011
Add dense_vector field type support
carlosdelest Apr 24, 2025
3c4c401
Knn function minimal support
carlosdelest Apr 24, 2025
8317911
Add scoring
carlosdelest Apr 24, 2025
e7736be
[CI] Auto commit changes from spotless
Apr 24, 2025
9588048
Add options
carlosdelest Apr 24, 2025
0f77374
Make Knn a FullTextFunction
carlosdelest Apr 28, 2025
891f4fc
make knn query not pushable
carlosdelest Apr 28, 2025
fb2a3c7
Add CSV tests and necessary infra for dense_vector field type
carlosdelest Apr 8, 2025
8e9b280
Make CSV test loader to use numbers when there are multivalued numeri…
carlosdelest Apr 8, 2025
0f58f24
Implicit casting
carlosdelest Apr 28, 2025
e92c92b
Format changes
carlosdelest May 6, 2025
1b7f02f
Merge remote-tracking branch 'carlosdelest/feature/esql-knn-function-…
carlosdelest May 6, 2025
e44745e
Add testing, fix LuceneQueryEvaluator to pick docs.getPositionCount i…
carlosdelest May 23, 2025
bf92cf4
Merge remote-tracking branch 'origin/main' into feature/esql-knn-func…
carlosdelest May 28, 2025
e1aecf0
Fix merge
carlosdelest May 28, 2025
239cf1e
[CI] Auto commit changes from spotless
May 28, 2025
1837242
Spotless
carlosdelest May 29, 2025
c203766
Merge remote-tracking branch 'carlosdelest/feature/esql-knn-function-…
carlosdelest May 29, 2025
7ae9909
Add test coverage for knn options
carlosdelest May 29, 2025
204efda
Initial CSV tests
carlosdelest May 29, 2025
1dd6008
Add boosting support
carlosdelest May 29, 2025
e4f31fc
Add CSV tests
carlosdelest May 29, 2025
0d5a66a
[CI] Auto commit changes from spotless
May 29, 2025
ad34463
Add CSV tests
carlosdelest May 29, 2025
aa97b8a
Merge remote-tracking branch 'carlosdelest/feature/esql-knn-function-…
carlosdelest May 29, 2025
26f48e7
Add CSV tests
carlosdelest May 29, 2025
e7452dd
Add CSV tests
carlosdelest May 29, 2025
22efe27
Add Knn doc annotations
carlosdelest May 29, 2025
7f5ddde
Add first version of KnnTests and generated docs
carlosdelest May 30, 2025
b352673
Merge remote-tracking branch 'origin/main' into feature/esql-knn-func…
carlosdelest May 30, 2025
66f8496
Add verifier tests
carlosdelest May 30, 2025
f756e85
Spotless
carlosdelest May 30, 2025
34968ad
Add verifier tests
carlosdelest May 30, 2025
77011c1
Add verifier tests, revert some changes to mappings
carlosdelest May 30, 2025
d60c8e5
Refactor verifier tests
carlosdelest May 30, 2025
eacb9a0
Refactor verifier tests
carlosdelest May 30, 2025
03a329a
Fix some tests for multiple shards
carlosdelest Jun 2, 2025
fbe8b6c
Simplify tests
carlosdelest Jun 2, 2025
6ea4995
Simplify tests
carlosdelest Jun 2, 2025
d28f2ea
Simplify tests
carlosdelest Jun 2, 2025
9caed86
Simplify tests
carlosdelest Jun 3, 2025
e8d8c25
Add capabilities checks
carlosdelest Jun 3, 2025
958dfba
Merge branch 'main' into feature/esql-knn-function-minimal-support
carlosdelest Jun 3, 2025
f2975a3
[CI] Auto commit changes from spotless
Jun 3, 2025
7a18aec
Remove unnecessary changes
carlosdelest Jun 3, 2025
fccf9a5
Add new full text functions data set and modify VerifierTests
carlosdelest Jun 3, 2025
3d70558
Don't use data with the CSV data loader, just the mapping
carlosdelest Jun 3, 2025
19548fa
Spotless
carlosdelest Jun 3, 2025
22a4c26
Fix tests with the same result scoring
carlosdelest Jun 3, 2025
ce64ba9
Merge remote-tracking branch 'origin/main' into feature/esql-knn-func…
carlosdelest Jun 3, 2025
e114453
Merge remote-tracking branch 'origin/main' into feature/esql-knn-func…
carlosdelest Jun 3, 2025
a34fc89
Fix merge
carlosdelest Jun 3, 2025
6fe7b2a
Spotless
carlosdelest Jun 3, 2025
a258820
Add test capability
carlosdelest Jun 4, 2025
9227dc7
Merge remote-tracking branch 'origin/main' into feature/esql-knn-func…
carlosdelest Jun 4, 2025
d3262a4
Fix loading dense_vectors when no data has been indexed (no dims spec…
carlosdelest Jun 4, 2025
cc81380
Merge branch 'non-issue/esql-fix-dense-vector-no-dims' into feature/e…
carlosdelest Jun 4, 2025
be1e578
Fix constructor visibility
carlosdelest Jun 4, 2025
524c93c
Change base functions for FTFs
carlosdelest Jun 4, 2025
65b3256
Add capability check for tests
carlosdelest Jun 4, 2025
853e096
Fixed tests via large k and limit
carlosdelest Jun 4, 2025
dc71549
Fix test for serverless / multi cluster
carlosdelest Jun 4, 2025
97b6c63
Replacing scores with round to avoid rounding errors
carlosdelest Jun 4, 2025
c3388de
Merge remote-tracking branch 'origin/main' into feature/esql-knn-func…
carlosdelest Jun 4, 2025
d824faa
Replacing scores with round to avoid rounding errors
carlosdelest Jun 5, 2025
78aa6d0
Remove quantization for less brittle tests
carlosdelest Jun 5, 2025
47b91f5
Add LIMIT to avoid multi cluster test failures
carlosdelest Jun 5, 2025
52f057b
I give up on testing scores. You win, multiple shards on serverless.
carlosdelest Jun 5, 2025
68ec878
Some tests make no sense as we're not deduplicating
carlosdelest Jun 5, 2025
12fb39c
Fixing test for serverless. Again.
carlosdelest Jun 6, 2025
36c26a5
Add test for null dimensions
carlosdelest Jun 6, 2025
b86f2fc
Merge branch 'main' into feature/esql-knn-function-minimal-support
carlosdelest Jun 6, 2025
c5b1292
Remove colors that have duplicate names to help with matching and sco…
carlosdelest Jun 6, 2025
bea423f
Merge remote-tracking branch 'carlosdelest/feature/esql-knn-function-…
carlosdelest Jun 6, 2025
da4d5bc
Merge remote-tracking branch 'origin/main' into feature/esql-knn-func…
carlosdelest Jun 6, 2025
49addf3
More test fixing
carlosdelest Jun 6, 2025
d1cd92c
Add check for knn availability
carlosdelest Jun 9, 2025
1842ab4
Merge branch 'main' into feature/esql-knn-function-minimal-support
carlosdelest Jun 9, 2025
a03ec92
Merge branch 'main' into feature/esql-knn-function-minimal-support
carlosdelest Jun 10, 2025
aaf8684
[CI] Auto commit changes from spotless
Jun 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 10 additions & 0 deletions docs/reference/query-languages/esql/kibana/docs/functions/knn.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -2589,6 +2589,11 @@ public BlockLoader blockLoader(MappedFieldType.BlockLoaderContext blContext) {
return null;
}

if (dims == null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding dense_vector field mapping to mapping-all-fields uncovered a bug - if no dimensions are set, a NPE was thrown when creating the BlockLoaders.

I can create a separate PR for this, but seemed unnecessary as it implied just this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if a test could be added for this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, added in 36c26a5

// No data has been indexed yet
return BlockLoader.CONSTANT_NULLS;
}

if (indexed) {
return new BlockDocValuesReader.DenseVectorBlockLoader(name(), dims);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ private Vector evalSingleSegmentNonDecreasing(DocVector docs) throws IOException
int min = docs.docs().getInt(0);
int max = docs.docs().getInt(docs.getPositionCount() - 1);
int length = max - min + 1;
try (T scoreBuilder = createVectorBuilder(blockFactory, length)) {
try (T scoreBuilder = createVectorBuilder(blockFactory, docs.getPositionCount())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing using random indexing and deletions, it became apparent that we need to use getPositionCount() instead of length, as length can be greater than position counts.

if (length == docs.getPositionCount() && length > 1) {
return segmentState.scoreDense(scoreBuilder, min, max);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1022,7 +1022,7 @@ public void testMultipleBatchesWithLookupJoin() throws IOException {
var query = requestObjectBuilder().query(format(null, "from * | lookup join {} on integer {}", testIndexName(), sort));
Map<String, Object> result = runEsql(query);
var columns = as(result.get("columns"), List.class);
assertEquals(21, columns.size());
assertEquals(22, columns.size());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added dense_vector to mappings-all-types, so a new column was added

var values = as(result.get("values"), List.class);
assertEquals(10, values.size());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ public class CsvTestsDataLoader {
private static final TestDataset LOGS = new TestDataset("logs");
private static final TestDataset MV_TEXT = new TestDataset("mv_text");
private static final TestDataset DENSE_VECTOR = new TestDataset("dense_vector");
private static final TestDataset COLORS = new TestDataset("colors");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think colors dataset is very intuitive for vector similarity tests - looking for RGB similar colors looks better than looking for random vectors IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart!!!


public static final Map<String, TestDataset> CSV_DATASET_MAP = Map.ofEntries(
Map.entry(EMPLOYEES.indexName, EMPLOYEES),
Expand Down Expand Up @@ -210,7 +211,8 @@ public class CsvTestsDataLoader {
Map.entry(SEMANTIC_TEXT.indexName, SEMANTIC_TEXT),
Map.entry(LOGS.indexName, LOGS),
Map.entry(MV_TEXT.indexName, MV_TEXT),
Map.entry(DENSE_VECTOR.indexName, DENSE_VECTOR)
Map.entry(DENSE_VECTOR.indexName, DENSE_VECTOR),
Map.entry(COLORS.indexName, COLORS)
);

private static final EnrichConfig LANGUAGES_ENRICH = new EnrichConfig("languages_policy", "enrich-policy-languages.json");
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
color:text,hex_code:keyword,rgb_vector:dense_vector,primary:boolean
maroon, #800000, [128,0,0], false
brown, #A52A2A, [165,42,42], false
firebrick, #B22222, [178,34,34], false
crimson, #DC143C, [220,20,60], false
red, #FF0000, [255,0,0], true
tomato, #FF6347, [255,99,71], false
coral, #FF7F50, [255,127,80], false
salmon, #FA8072, [250,128,114], false
orange, #FFA500, [255,165,0], false
gold, #FFD700, [255,215,0], false
golden rod, #DAA520, [218,165,32], false
khaki, #F0E68C, [240,230,140], false
olive, #808000, [128,128,0], false
yellow, #FFFF00, [255,255,0], true
chartreuse, #7FFF00, [127,255,0], false
green, #008000, [0,128,0], true
lime, #00FF00, [0,255,0], false
teal, #008080, [0,128,128], false
cyan, #00FFFF, [0,255,255], true
turquoise, #40E0D0, [64,224,208], false
aqua marine, #7FFFD4, [127,255,212], false
navy, #000080, [0,0,128], false
blue, #0000FF, [0,0,255], true
indigo, #4B0082, [75,0,130], false
purple, #800080, [128,0,128], false
thistle, #D8BFD8, [216,191,216], false
plum, #DDA0DD, [221,160,221], false
violet, #EE82EE, [238,130,238], false
magenta, #FF00FF, [255,0,255], true
orchid, #DA70D6, [218,112,214], false
pink, #FFC0CB, [255,192,203], false
beige, #F5F5DC, [245,245,220], false
bisque, #FFE4C4, [255,228,196], false
wheat, #F5DEB3, [245,222,179], false
corn silk, #FFF8DC, [255,248,220], false
lemon chiffon, #FFFACD, [255,250,205], false
sienna, #A0522D, [160,82,45], false
chocolate, #D2691E, [210,105,30], false
peru, #CD853F, [205,133,63], false
burly wood, #DEB887, [222,184,135], false
tan, #D2B48C, [210,180,140], false
moccasin, #FFE4B5, [255,228,181], false
peach puff, #FFDAB9, [255,218,185], false
misty rose, #FFE4E1, [255,228,225], false
linen, #FAF0E6, [250,240,230], false
old lace, #FDF5E6, [253,245,230], false
papaya whip, #FFEFD5, [255,239,213], false
sea shell, #FFF5EE, [255,245,238], false
mint cream, #F5FFFA, [245,255,250], false
lavender, #E6E6FA, [230,230,250], false
honeydew, #F0FFF0, [240,255,240], false
ivory, #FFFFF0, [255,255,240], false
azure, #F0FFFF, [240,255,255], false
snow, #FFFAFA, [255,250,250], false
black, #000000, [0,0,0], true
gray, #808080, [128,128,128], true
silver, #C0C0C0, [192,192,192], false
gainsboro, #DCDCDC, [220,220,220], false
white, #FFFFFF, [255,255,255], true
Loading
Loading