Adding bloom command meta data, bloom group and bloom data type documentaion #233

zackcam · 2025-02-20T00:26:56Z

This is one of three PR's that will be done for adding information about the bloom module to the Valkey website:
Bloom repo json command files: valkey-io/valkey-bloom#47
valkey-io.github.io: valkey-io/valkey-io.github.io#212

This PR has three main changes

Adding the bloom command group
Adding bloom command metadata files (Example for bf.add below)

3. Adding bloom data type documents

as well Signed-off-by: zackcam <[email protected]>

zuiderkwast

Very interesting!

I skimmed through it very quickly. The documentation itself looks great AFAICT. I can do a more detailed review later.

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install. I think we should mentioned it on the bloom filters topic page with a link to the github repo. The BF command pages should link to that topic page, so the pages are all linked together.

To build man pages, the scripts in this repo need to be able to take multiple command JSON files. This needs to be added to the Makefile, the README and maybe the python scripts too. Please try to build the man pages as described in the README of this repo.

groups.json

zuiderkwast · 2025-02-20T08:33:23Z

Many of the spellcheck errors can be fixed simply but writing the command names in backticks. Stuff in backticks are excluded from spellcheck IIRC.

zackcam · 2025-02-20T18:53:34Z

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Does this seem like something that would be wanted?

zuiderkwast · 2025-02-20T19:07:11Z

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Yes, something like that would be good. In your screenshot it looks like the "Extensions" sub-heading is part of "Module Data Types" though, because of the levels of the headings used. If we do this, then "Module Data Types" should be a level-2 heading and "Bloom Filter" a level-3 heading under it.

How about just mentioning the module within the description? Something like this?

 ## Bloom Filter
 
 [Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.
+Bloom filters are provided by the module `valkey-bloom`.
 For more information, see:

 * [Overview of Bloom Filters](bloomfilters.md)
 * [Bloom filter command reference](../commands/#bloom)
+* [The valkey-bloom module on GitHub](https://github.com/valkey-io/valkey-bloom/)

madolson · 2025-02-20T19:54:25Z

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

zuiderkwast · 2025-02-20T20:37:24Z

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

madolson · 2025-02-20T20:40:45Z

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

I don't have a strong preference one way or the other about flat/nested, so sticking with flat is OK for me.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

Yeah, I guess immediately let's make sure there is something in the JSON file. Maybe Module Required: <link to Bloom>.

madolson

Not a super deep review. I think we should indicate more clearly that the commands are from a module and not part of the core. That can maybe from the json docs only though.

madolson · 2025-02-20T20:43:09Z

commands/bf.add.md

+* key (required) - A Valkey key of Bloom data type
+* item (required) - Item to add


Suggested change

* key (required) - A Valkey key of Bloom data type

* item (required) - Item to add

We typically omit this, since the usage would be included at the top which will indicate if something is required.

Yeah makes sense I removed all these from the bloom commands and if I think the arguments needed explained updated the heading name

madolson · 2025-02-20T20:44:18Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.


Suggested change

Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.

Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.

If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

Updated and made it less wordy as well by removing 'specified' from the description

madolson · 2025-02-20T20:45:03Z

commands/bf.exists.md

@@ -0,0 +1,16 @@
+Determines if a specified item has been added to the specified bloom filter.
+Syntax


Suggested change

Syntax

madolson · 2025-02-20T20:46:16Z

commands/bf.info.md

@@ -0,0 +1,35 @@
+Returns information about a bloomfilter
+
+## Arguments


These need to be kept because they include the info data, but I would change this to be about info fields or something.

madolson · 2025-02-20T20:47:53Z

commands/bf.info.md

+## Arguments
+* key (required) - A valkey key of bloom data type
+* CAPACITY (optional) - Returns the number of unique items that would need to be added before scaling would happen
+* SIZE (optional) - Returns the memory size which is the number of bytes allocated


Suggested change

* SIZE (optional) - Returns the memory size which is the number of bytes allocated

* SIZE (optional) - Returns the number of bytes allocated

Why waste time say lot word when few word do trick?

madolson · 2025-02-20T20:57:57Z

topics/data-types.md


+## Bloom Filter
+
+[Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.


I would translate this to english with an example.

I tried to make this more understandable but I think potentially having what I use in the exists and mexists commands could also work if the new version still isn't great

madolson · 2025-02-20T20:59:37Z

topics/bloomfilters.md

+
+Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.
+
+## Bloom commands


Are other examples include the "basic commands" up front, and then the more sophisticated commands later. I think we should do the same.

madolson · 2025-02-20T21:00:41Z

topics/bloomfilters.md

+
+**Financial fraud detection**
+
+Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.


Is this a real use case? The false positive here is not idea, since it might make it seem like a transaction is legitimate when it is not.

Updated this use case to be more about card fraud instead of location based checking

madolson · 2025-02-20T21:01:14Z

topics/bloomfilters.md

+
+Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.
+
+For the above each user would have a Bloom filter which is then checked for every transaction.


Might just merge this into the previous paragraph.

madolson · 2025-02-20T21:28:53Z

topics/bloomfilters.md

+
+**Check if URL's are malicious**
+
+Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter. 


Suggested change

Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.

Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

zuiderkwast

Not a complete review.

We need to think about what we want regarding

How to show which module a command belongs to and how to store this in the JSON file(s).
What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

commands/commands

resp2_replies.json

topics/bloomfilters.md

commands/bf.add.md

topics/bloomfilters.md

resp3_replies.json

madolson · 2025-02-20T23:39:42Z

What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

I think for now we should show the independent modules version number, since we got alignment on that. Internally at AWS we are planning on reviving valkey-io/valkey#408 and posting some suggestions. Once that has alignment, we can maybe add more information about where it's available (i.e. Valkey core since 10.0, valkey-bloom since 1.0)

zackcam · 2025-02-21T07:53:40Z

List of non word choice/ document wording changes
The change to version isn't done in this repo but were discussed on this pr so adding screenshot:

Still looking at how best to determine if a command is from a specific module so that it is easy to expand on for future modules as well (the io pr has not been updated yet to include this module version change I will push that once I find out how to determine between modules)

Man page generation for modules, example for bf.add

For future modules there are only a few places they will need to add to in the make file
Main callout on change they need to do below others should be clear:
Line 187: $(eval VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT) $(FUTURE_MODULE))

…to generate bloom man pages Signed-off-by: zackcam <[email protected]>

zackcam · 2025-02-21T20:17:37Z

New command page example with hyperlink to module repo:

KarthikSubbarao · 2025-03-05T17:36:31Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.


Suggested change

Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.

Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.

KarthikSubbarao · 2025-03-05T17:38:31Z

commands/bf.add.md

@@ -0,0 +1,12 @@
+Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
+
+If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.


By non-standard options, you mean the non default properties. Right?

Suggested change

If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.

To add multiple items to a bloom filter, you can use the BF.MADD or BF.INSERT commands.

If you want to create a bloom filter with non-default properties, use the `BF.INSERT` or `BF.RESERVE` command.

Yeah non standard meant non default, but agree makes more sense to say non default and that keeps it consistent

KarthikSubbarao · 2025-03-05T17:39:34Z

commands/bf.card.md

@@ -0,0 +1,12 @@
+Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter. 


Suggested change

Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.

Returns the cardinality of a Bloom filter which is the number of items that have been successfully added to it.

KarthikSubbarao · 2025-03-05T17:40:38Z

commands/bf.card.md

+1
+127.0.0.1:6379> BF.CARD key
+1
+127.0.0.1:6379> BF.CARD missing


Suggested change

127.0.0.1:6379> BF.CARD missing

127.0.0.1:6379> BF.CARD nonexistentkey

KarthikSubbarao · 2025-03-05T17:43:42Z

commands/bf.exists.md

@@ -0,0 +1,19 @@
+Determines if an item has been added to the bloom filter. 


Suggested change

Determines if an item has been added to the bloom filter.

Determines if an item has been added to the bloom filter previously.

commands/bf.insert.md

KarthikSubbarao · 2025-03-05T19:06:39Z

I only reviewed the Command Documentation.

I will need to review the remaining sections next

topics/data-types.md

topics/bloomfilters.md

Makefile

Signed-off-by: zackcam <[email protected]>

topics/bloomfilters.md

section to bloomfilter topic, cleaned up other bloomfilter topic sections

Makefile

resp3_replies.json

resp2_replies.json

topics/data-types.md

topics/bloomfilters.md

Making changes based on review comments Co-authored-by: KarthikSubbarao <[email protected]> Signed-off-by: zackcam <[email protected]>

zackcam · 2025-03-31T16:04:59Z

commands/bf.add.md

@@ -0,0 +1,14 @@
+Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.
+
+To add multiple items to a bloom filter, you can use the `BF.MADD` or `BF.INSERT` commands.


I think it is more grammatically correct to have the s as it it showing there are multiple commands that can do this not just one

KarthikSubbarao · 2025-03-29T06:47:49Z

topics/bloomfilters.md

+
+We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.
+
+As seen below, when trying to create a bloom filter with a capacity that cannot be achieved through scale outs (given the memory limits), the command is rejected. However, if the capacity can be achieved through scale out (even with the limits) then the creation of the bloom filter will succeed.


Suggested change

As seen below, when trying to create a bloom filter with a capacity that cannot be achieved through scale outs (given the memory limits), the command is rejected. However, if the capacity can be achieved through scale out (even with the limits) then the creation of the bloom filter will succeed.

As seen below, when trying to create a bloom filter with a capacity that cannot be achieved through scale outs (given the memory limits), the command is rejected. However, if the capacity can be achieved through scale out (even with the limits), the creation of the bloom filter will succeed.

KarthikSubbarao · 2025-03-29T06:48:23Z

topics/bloomfilters.md

+
+## Performance
+
+The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.


Suggested change

The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.

The bloom commands which involve adding items or checking the existence of items have a time complexity of O(N * K) where N is the number of hash functions used by the bloom filter and K is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(N) as they only operate on one item.

KarthikSubbarao · 2025-03-29T19:33:42Z

commands/bf.insert.md

+* EXPANSION *expansion* - This option will specify the bloom filter as scaling and controls the size of the sub filter that will be created upon scale out / expansion of the bloom filter.
+* NOCREATE  - Will not create the bloom filter and add items if the filter does not exist already.
+* TIGHTENING *tightening_ratio* - The tightening ratio for the bloom filter.
+* SEED *seed* - The seed the hash functions will use.


Suggested change

* SEED *seed* - The seed the hash functions will use.

* SEED *seed* - The 32 byte seed the bloom filter's hash functions will use.

KarthikSubbarao · 2025-03-29T19:36:59Z

topics/bloomfilters.md

+[]
+```
+
+We can use the `BF.INFO` command's `MAXSCALEDCAPACITY` field to find out the maximum capacity that the scalable bloom filter can expand to hold.


Suggested change

We can use the `BF.INFO` command's `MAXSCALEDCAPACITY` field to find out the maximum capacity that the scalable bloom filter can expand to hold.

The `BF.INFO` command's `MAXSCALEDCAPACITY` field can be used to find out the maximum capacity that the scalable bloom filter can expand to hold.

KarthikSubbarao · 2025-03-29T20:56:59Z

topics/bloomfilters.md

+
+Bloom filters can be used to answer the question, "Has this card been flagged as stolen?". To do this, use a bloom filter that contains cards reported as stolen. When a card is used, check whether it is present in the bloom filter. If the card is not found, it means it is not marked as stolen. If the card is present in the filter, a check can be made against the main database, or the purchase can be denied.
+
+### Ad placement / Deduplication


Renaming usecase

Suggested change

### Ad placement / Deduplication

### Advertisement / Campaign placement and deduplication

Also, let's make this the first section in the list of use cases. Fraud detection can be second.

KarthikSubbarao · 2025-03-29T21:09:00Z

topics/bloomfilters.md

+Bloom filters can help advertisers answer the following questions:
+* Has the user already seen this ad?
+* Has the user already purchased this product?
+
+For each user, use a Bloom filter to store all the products they have purchased. The recommendation engine can then suggest a new product and check if it is present in the user's Bloom filter.
+
+* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.
+* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.


Suggested change

Bloom filters can help advertisers answer the following questions:

* Has the user already seen this ad?

* Has the user already purchased this product?

For each user, use a Bloom filter to store all the products they have purchased. The recommendation engine can then suggest a new product and check if it is present in the user's Bloom filter.

* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.

* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.

Bloom filters can help e-commerce sites, streaming services, advertising networks, or marketing platforms answer the following questions:

* Has an advertisement already been shown to a user?

* Has a promotional email or notification already been sent to a user?

* Has a product already been purchased by a user?

Example: For each user, use a Bloom filter to store all the products they have purchased. The recommendation engine can then suggest a new product and check if it is present in the user's Bloom filter.

* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.

* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.

KarthikSubbarao · 2025-03-29T21:13:57Z

topics/bloomfilters.md

+* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.
+* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.
+
+### Check if URL's are malicious


Above this, we could add a use case for "Filtering out Spam/Harmful Content". Or you can combine both by making this title generic and listing both usecases below

KarthikSubbarao · 2025-03-31T03:30:17Z

topics/bloomfilters.md

+
+The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.
+
+Since performance relies on the number of hash functions, choosing the correct capacity and expansion rate can be important. In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity after evaluating the use case / workload to avoid several scale outs and reduce the number of checks.


Suggested change

Since performance relies on the number of hash functions, choosing the correct capacity and expansion rate can be important. In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity after evaluating the use case / workload to avoid several scale outs and reduce the number of checks.

In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity and expansion rate after evaluating the use case / workload to avoid several scale outs and reduce the number of checks.

KarthikSubbarao

Thank you for the changes and details @zackcam .

Approved

Signed-off-by: zackcam <[email protected]>

hpatro

nit picks.

commands/bf.load.md

commands/bf.exists.md

commands/bf.card.md

Co-authored-by: Harkrishn Patro <[email protected]> Signed-off-by: zackcam <[email protected]>

hpatro · 2025-04-01T00:23:44Z

I don't have write permissions to this repository, @zuiderkwast or @madolson could one of you help review and close this out? Once this is in, we could get the website PR closed and verify the changes and do the same activity for the JSON changes. Thanks.

madolson · 2025-04-01T15:59:15Z

commands/bf.card.md

+
+```
+127.0.0.1:6379> BF.ADD key val
+1


Is this a string or an integer response? This implies simple string, which seems odd.

It's an integer copying the response I got over now

madolson · 2025-04-01T15:59:29Z

commands/bf.card.md

+1
+127.0.0.1:6379> BF.CARD nonexistentkey
+0
+```


Add a trailing new line

madolson · 2025-04-01T16:04:02Z

commands/bf.insert.md

+```
+```
+127.0.0.1:6379> BF.INSERT key NOCREATE ITEMS item1 item2
+(error) ERR not found


This is not a very good error, is it too late to make it better?

This error message is from existing API and error messages from existing client libraries that support bloom filters
We followed the existing error messages to be API compatible with the bloom filter commands of existing client libraries

madolson · 2025-04-01T16:06:52Z

topics/bloomfilters.md

+
+These are the default bloom properties along with the commands and configs which allow customizing.
+
+<table width="100%" border="1" style="border-collapse: collapse; border: 1px solid black" cellpadding="8">


I think this will be wonky when ever we end up putting it in a man page. I think we should keep it as vanilla markdown.

madolson · 2025-04-01T16:09:56Z

topics/bloomfilters.md

+
+* `bf_bloom_defrag_misses`: Total number of defrag misses that have occurred on bloom filters.
+
+## Limits


These are more of configs as opposed to limits.

In this document, since we already do list the configs in a section above, how about naming this section as "Large Bloom Filters"? Or "Handling Large Bloom Filters"?

madolson · 2025-04-01T16:10:22Z

topics/bloomfilters.md

+    When a bloom filter scales out, a new sub filter is added. The limit on the number of sub filters depends on the false positive rate and tightening ratio. Each sub filter has a stricter false positive, and this is controlled by the tightening ratio. If a command attempting a scale out results in the sub filter reaching a false positive of 0, the command is rejected. 
+
+
+We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.


Suggested change

We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.

You can use `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.

KarthikSubbarao · 2025-04-01T17:03:56Z

topics/bloomfilters.md

+
+## Handling Large Bloom Filters
+
+There are two limits a bloom filter faces.


Suggested change

There are two limits a bloom filter faces.

There are two notable validations bloom filters face.

KarthikSubbarao · 2025-04-01T17:05:35Z

topics/bloomfilters.md

+
+1. Memory Usage Limit:
+
+    The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected.


Suggested change

The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected.

The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected. This config is modifiable and can be increased as needed.

KarthikSubbarao · 2025-04-01T17:08:50Z

topics/bloomfilters.md

+
+There are two limits a bloom filter faces.
+
+1. Memory Usage Limit:


Suggested change

1. Memory Usage Limit:

1. Memory Usage:

Signed-off-by: zackcam <[email protected]>

…yed on the Valkey website (#212) Related PR's Bloom repo json command files: valkey-io/valkey-bloom#47 Valkey-doc repo: valkey-io/valkey-doc#233 ### Description This PR will allow set the framework so that modules can have their commands displayed on the valkey website (By adding the bloom module commands in a way that can be easily expanded on). I have tried to make this future proof by using a for loop on the `commands.html` page which can be expanded by just adding any new folders we want to pull commands from. For the `command-page.html` I have used an array to hold the data from the multiple folders with commands and then get the first occurrence that isn't empty (i.e the command belongs to that folder). This will keep ability so that if the command doesn't exist we still have the same fallback. Updated the `init-commands.sh` to create a link for the bloom commands as well and take in the bloom repository. I have updated the README as well to include the new repo that will be needed for the commands and the information change associated with now expecting commands from the bloom repo. Lastly updated the github workflow as well to also now build and take in the bloom repo **For screenshots of the new documentation the two pr's above (valkey-io/valkey-doc#233 and valkey-io/valkey-bloom#47) have screenshots of all sections being added** ### Check List - [x] Commits are signed per the DCO using `--signoff` By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License. Signed-off-by: zackcam <[email protected]>

Signed-off-by: zackcam <[email protected]>

zuiderkwast · 2025-04-02T20:58:17Z

commands/bf.insert.md

+* TIGHTENING *tightening_ratio* - The tightening ratio for the bloom filter.
+* SEED *seed* - The 32 byte seed the bloom filter's hash functions will use.
+* NONSCALING - This option will configure the bloom filter as non scaling; it cannot expand / scale beyond its specified capacity.
+* VALIDATESCALETO *validatescaleto* - Validates if the filter can scale out and reach to this capacity based on limits and if not, return an error without creating the bloom filter.


This is why you had to add validatescaleto to the spellcheck wordlist.

The idea is that keywords like this should be put within backticks. Then it doesn't need to be in the wordlist.

It's possible to combine italics + backticks if needed, for example:

`VALIDATESCALETO` *`validatescaleto`*

Rendered as

VALIDATESCALETO validatescaleto

Adding bloom command meta data and bloom group, adding bloom data type

6f84713

as well Signed-off-by: zackcam <[email protected]>

madolson requested review from madolson and zuiderkwast February 20, 2025 00:37

zackcam mentioned this pull request Feb 20, 2025

Adding functionality for the bloom module to have its commands displayed on the Valkey website valkey-io/valkey-io.github.io#212

Merged

1 task

zuiderkwast reviewed Feb 20, 2025

View reviewed changes

groups.json Outdated Show resolved Hide resolved

madolson reviewed Feb 20, 2025

View reviewed changes

zuiderkwast reviewed Feb 20, 2025

View reviewed changes

zackcam force-pushed the main branch from 2121cad to cd80826 Compare February 21, 2025 08:11

First round of rewording and changes to documentation. Added ability …

1f892b9

…to generate bloom man pages Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from cd80826 to 1f892b9 Compare February 21, 2025 20:16

KarthikSubbarao reviewed Mar 5, 2025

View reviewed changes

KarthikSubbarao reviewed Mar 6, 2025

View reviewed changes

zackcam force-pushed the main branch from 9182b75 to b4e71e4 Compare March 7, 2025 19:27

roshkhatri reviewed Mar 8, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Changes based on feedback for bloom commands and documentation

f062a8a

Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from b4e71e4 to f062a8a Compare March 11, 2025 19:24

KarthikSubbarao reviewed Mar 12, 2025

View reviewed changes

Adding aditional field that can be returned by BF.INFO, added monitoring

48d8cd0

section to bloomfilter topic, cleaned up other bloomfilter topic sections

zackcam force-pushed the main branch from a8cbac0 to 48d8cd0 Compare March 12, 2025 23:29

zuiderkwast reviewed Mar 17, 2025

View reviewed changes

Makefile Outdated Show resolved Hide resolved

Makefile Show resolved Hide resolved

KarthikSubbarao reviewed Mar 29, 2025

View reviewed changes

topics/data-types.md Outdated Show resolved Hide resolved

topics/bloomfilters.md Outdated Show resolved Hide resolved

topics/bloomfilters.md Outdated Show resolved Hide resolved

Apply suggestions from code review

8e1f364

Making changes based on review comments Co-authored-by: KarthikSubbarao <[email protected]> Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from d571758 to 8e1f364 Compare March 29, 2025 06:22

KarthikSubbarao reviewed Mar 29, 2025

View reviewed changes

KarthikSubbarao reviewed Mar 31, 2025

View reviewed changes

zackcam force-pushed the main branch 2 times, most recently from 432ed3d to 4be2099 Compare March 31, 2025 18:35

KarthikSubbarao approved these changes Mar 31, 2025

View reviewed changes

Updating for review comments

61f03c0

Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from 4be2099 to 61f03c0 Compare March 31, 2025 18:48

KarthikSubbarao approved these changes Mar 31, 2025

View reviewed changes

hpatro approved these changes Mar 31, 2025

View reviewed changes

commands/bf.load.md Outdated Show resolved Hide resolved

commands/bf.exists.md Outdated Show resolved Hide resolved

commands/bf.card.md Outdated Show resolved Hide resolved

Aligning capitalization across bloom commands

613ed72

Co-authored-by: Harkrishn Patro <[email protected]> Signed-off-by: zackcam <[email protected]>

madolson reviewed Apr 1, 2025

View reviewed changes

zackcam force-pushed the main branch from 9821f14 to e571fb8 Compare April 1, 2025 16:46

KarthikSubbarao reviewed Apr 1, 2025

View reviewed changes

zackcam force-pushed the main branch from e571fb8 to 8c5bc80 Compare April 1, 2025 17:11

Updating command responses and making table in markdown not HTML

8719dad

Signed-off-by: zackcam <[email protected]>

zackcam force-pushed the main branch from 8c5bc80 to 8719dad Compare April 1, 2025 17:19

Adding words that are causing spellcheck to fail

11c88c2

Signed-off-by: zackcam <[email protected]>

madolson approved these changes Apr 2, 2025

View reviewed changes

madolson merged commit 9c13637 into valkey-io:main Apr 2, 2025
2 checks passed

zuiderkwast reviewed Apr 2, 2025

View reviewed changes

yairgott mentioned this pull request May 8, 2025

Add ValkeySearch section under Valkey public documentation valkey-io/valkey-search#115

Closed

		* key (required) - A Valkey key of Bloom data type
		* item (required) - Item to add

		@@ -0,0 +1,12 @@
		Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.

	Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
	Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
	If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

		@@ -0,0 +1,16 @@
		Determines if a specified item has been added to the specified bloom filter.
		Syntax

		@@ -0,0 +1,35 @@
		Returns information about a bloomfilter

		## Arguments

	* SIZE (optional) - Returns the memory size which is the number of bytes allocated
	* SIZE (optional) - Returns the number of bytes allocated


		## Bloom Filter

		[Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.


		Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.

		## Bloom commands


		Financial fraud detection

		Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.


		Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.

		For the above each user would have a Bloom filter which is then checked for every transaction.


		Check if URL's are malicious

		Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.

	Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.
	Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

	Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
	Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.

-If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.
+To add multiple items to a bloom filter, you can use the BF.MADD or BF.INSERT commands.
+If you want to create a bloom filter with non-default properties, use the `BF.INSERT` or `BF.RESERVE` command.

		@@ -0,0 +1,12 @@
		Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.

	Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.
	Returns the cardinality of a Bloom filter which is the number of items that have been successfully added to it.

	127.0.0.1:6379> BF.CARD missing
	127.0.0.1:6379> BF.CARD nonexistentkey

		@@ -0,0 +1,19 @@
		Determines if an item has been added to the bloom filter.

		@@ -0,0 +1,14 @@
		Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.

		To add multiple items to a bloom filter, you can use the `BF.MADD` or `BF.INSERT` commands.


		We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.

		As seen below, when trying to create a bloom filter with a capacity that cannot be achieved through scale outs (given the memory limits), the command is rejected. However, if the capacity can be achieved through scale out (even with the limits) then the creation of the bloom filter will succeed.

Adding bloom command meta data, bloom group and bloom data type documentaion #233

Adding bloom command meta data, bloom group and bloom data type documentaion #233

Uh oh!

Conversation

zackcam commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR has three main changes

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zuiderkwast commented Feb 20, 2025

Uh oh!

zackcam commented Feb 20, 2025

Uh oh!

zuiderkwast commented Feb 20, 2025

Uh oh!

madolson commented Feb 20, 2025

Uh oh!

zuiderkwast commented Feb 20, 2025

Uh oh!

madolson commented Feb 20, 2025

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuiderkwast left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson commented Feb 20, 2025

Uh oh!

zackcam commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zackcam commented Feb 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zackcam commented Feb 20, 2025 •

edited

Loading

zuiderkwast left a comment •

edited

Loading

zackcam commented Feb 21, 2025 •

edited

Loading


		## Performance

		The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.

	* SEED seed - The seed the hash functions will use.
	* SEED seed - The 32 byte seed the bloom filter's hash functions will use.

	We can use the `BF.INFO` command's `MAXSCALEDCAPACITY` field to find out the maximum capacity that the scalable bloom filter can expand to hold.
	The `BF.INFO` command's `MAXSCALEDCAPACITY` field can be used to find out the maximum capacity that the scalable bloom filter can expand to hold.


		Bloom filters can be used to answer the question, "Has this card been flagged as stolen?". To do this, use a bloom filter that contains cards reported as stolen. When a card is used, check whether it is present in the bloom filter. If the card is not found, it means it is not marked as stolen. If the card is present in the filter, a check can be made against the main database, or the purchase can be denied.

		### Ad placement / Deduplication

	### Ad placement / Deduplication
	### Advertisement / Campaign placement and deduplication


		The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.

		Since performance relies on the number of hash functions, choosing the correct capacity and expansion rate can be important. In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity after evaluating the use case / workload to avoid several scale outs and reduce the number of checks.


		These are the default bloom properties along with the commands and configs which allow customizing.

		<table width="100%" border="1" style="border-collapse: collapse; border: 1px solid black" cellpadding="8">


		* `bf_bloom_defrag_misses`: Total number of defrag misses that have occurred on bloom filters.

		## Limits

		When a bloom filter scales out, a new sub filter is added. The limit on the number of sub filters depends on the false positive rate and tightening ratio. Each sub filter has a stricter false positive, and this is controlled by the tightening ratio. If a command attempting a scale out results in the sub filter reaching a false positive of 0, the command is rejected.


		We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.

	We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.
	You can use `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.