Skip to content

Conversation

@zackcam
Copy link
Contributor

@zackcam zackcam commented Feb 20, 2025

This is one of three PR's that will be done for adding information about the bloom module to the Valkey website:
Bloom repo json command files: valkey-io/valkey-bloom#47
valkey-io.github.io: valkey-io/valkey-io.github.io#212

This PR has three main changes

  1. Adding the bloom command group
    image

  2. Adding bloom command metadata files (Example for bf.add below)

image
3. Adding bloom data type documents
image
image
image
image

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting!

I skimmed through it very quickly. The documentation itself looks great AFAICT. I can do a more detailed review later.

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install. I think we should mentioned it on the bloom filters topic page with a link to the github repo. The BF command pages should link to that topic page, so the pages are all linked together.

To build man pages, the scripts in this repo need to be able to take multiple command JSON files. This needs to be added to the Makefile, the README and maybe the python scripts too. Please try to build the man pages as described in the README of this repo.

@zuiderkwast
Copy link
Contributor

Many of the spellcheck errors can be fixed simply but writing the command names in backticks. Stuff in backticks are excluded from spellcheck IIRC.

@zackcam
Copy link
Contributor Author

zackcam commented Feb 20, 2025

The commands look very much like built-in commands. It's not mentioned anywhere that it's a separate module that users need to install

I think we can make it more explicit on the data type page as well by making a modules section. i.e
image
Does this seem like something that would be wanted?

@zuiderkwast
Copy link
Contributor

I think we can make it more explicit on the data type page as well by making a modules section. i.e

Yes, something like that would be good. In your screenshot it looks like the "Extensions" sub-heading is part of "Module Data Types" though, because of the levels of the headings used. If we do this, then "Module Data Types" should be a level-2 heading and "Bloom Filter" a level-3 heading under it.

How about just mentioning the module within the description? Something like this?

 ## Bloom Filter
 
 [Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.
+Bloom filters are provided by the module `valkey-bloom`.
 For more information, see:

 * [Overview of Bloom Filters](bloomfilters.md)
 * [Bloom filter command reference](../commands/#bloom)
+* [The valkey-bloom module on GitHub](https://github.com/valkey-io/valkey-bloom/)

@madolson
Copy link
Member

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

@zuiderkwast
Copy link
Contributor

@zuiderkwast I also wanted to get your input about how we should structure the modules to make it clear they aren't part of the core. The current structure is they are intermingled. I don't really have an opinion yet, but one alternative would be to at least separate them in a separate folder structure and clarify which module they are apart of.

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

@madolson
Copy link
Member

Are you talking about the URLs of the commands? I like that it's a flat structure, just like the commands are in a global flat namespace. The BF. prefix is enough.

I don't have a strong preference one way or the other about flat/nested, so sticking with flat is OK for me.

But we should definitely show it in some way. A line somewhere on each command page would be good. I hope we can be generate it in some way from an optional key in the command JSON file or something like that.

Yeah, I guess immediately let's make sure there is something in the JSON file. Maybe Module Required: <link to Bloom>.

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a super deep review. I think we should indicate more clearly that the commands are from a module and not part of the core. That can maybe from the json docs only though.

Comment on lines 3 to 4
* key (required) - A Valkey key of Bloom data type
* item (required) - Item to add
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* key (required) - A Valkey key of Bloom data type
* item (required) - Item to add

We typically omit this, since the usage would be included at the top which will indicate if something is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense I removed all these from the bloom commands and if I think the arguments needed explained updated the heading name

@@ -0,0 +1,12 @@
Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Adds an item to a bloom filter, if the specified filter does not exist creates a default bloom filter with that name.
Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
If you want to create a bloom filter with non-standard options, use the `BF.INSERT` command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and made it less wordy as well by removing 'specified' from the description

@@ -0,0 +1,16 @@
Determines if a specified item has been added to the specified bloom filter.
Syntax
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Syntax

@@ -0,0 +1,35 @@
Returns information about a bloomfilter

## Arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be kept because they include the info data, but I would change this to be about info fields or something.

## Arguments
* key (required) - A valkey key of bloom data type
* CAPACITY (optional) - Returns the number of unique items that would need to be added before scaling would happen
* SIZE (optional) - Returns the memory size which is the number of bytes allocated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* SIZE (optional) - Returns the memory size which is the number of bytes allocated
* SIZE (optional) - Returns the number of bytes allocated

Why waste time say lot word when few word do trick?


## Bloom Filter

[Bloom filters](bloomfilters.md) provides a space efficient probabilistic data structure that allows checking if an element is a member of a set. False positives are possible, but it guarantees no false negatives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would translate this to english with an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this more understandable but I think potentially having what I use in the exists and mexists commands could also work if the new version still isn't great


Bloom filters are a space efficient probabilistic data structure that allows checking whether an element is member of a set. False positives are possible, but it guarantees no false negatives.

## Bloom commands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are other examples include the "basic commands" up front, and then the more sophisticated commands later. I think we should do the same.


**Financial fraud detection**

Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a real use case? The false positive here is not idea, since it might make it seem like a transaction is legitimate when it is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this use case to be more about card fraud instead of location based checking


Bloom filters can help answer the question "Has the user paid from this location before?", which can then give insights if there has been suspicious activity in shopping habits.

For the above each user would have a Bloom filter which is then checked for every transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just merge this into the previous paragraph.


**Check if URL's are malicious**

Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Bloom filters can answer the question is a URL malicious. Any URL inputted would be checked against a malicious URL bloom filter.
Bloom filters can answer the question "is a URL malicious?". Any URL inputted would be checked against a malicious URL bloom filter.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a complete review.

We need to think about what we want regarding

  1. How to show which module a command belongs to and how to store this in the JSON file(s).
  2. What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

@madolson
Copy link
Member

What to show in the Since fields. If we'll release some valkey-with-modules bundle, then the version number should probably follow valkey's versioning(?).

I think for now we should show the independent modules version number, since we got alignment on that. Internally at AWS we are planning on reviving valkey-io/valkey#408 and posting some suggestions. Once that has alignment, we can maybe add more information about where it's available (i.e. Valkey core since 10.0, valkey-bloom since 1.0)

@zackcam
Copy link
Contributor Author

zackcam commented Feb 21, 2025

List of non word choice/ document wording changes
The change to version isn't done in this repo but were discussed on this pr so adding screenshot:
image
Still looking at how best to determine if a command is from a specific module so that it is easy to expand on for future modules as well (the io pr has not been updated yet to include this module version change I will push that once I find out how to determine between modules)

Man page generation for modules, example for bf.add
image
For future modules there are only a few places they will need to add to in the make file
Main callout on change they need to do below others should be clear:
Line 187: $(eval VALKEY_ROOTS := $(VALKEY_ROOT) $(VALKEY_BLOOM_ROOT) $(FUTURE_MODULE))

@zackcam
Copy link
Contributor Author

zackcam commented Feb 21, 2025

New command page example with hyperlink to module repo:
Screenshot 2025-02-21 at 12 14 04 PM

@@ -0,0 +1,12 @@
Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.
Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.

@@ -0,0 +1,12 @@
Adds an item to a bloom filter, if the specified bloom filter does not exist creates a bloom filter with default configurations with that name.

If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By non-standard options, you mean the non default properties. Right?

Suggested change
If you want to create a bloom filter with non-standard options, use the `BF.INSERT` or `BF.RESERVE` command.
To add multiple items to a bloom filter, you can use the BF.MADD or BF.INSERT commands.
If you want to create a bloom filter with non-default properties, use the `BF.INSERT` or `BF.RESERVE` command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah non standard meant non default, but agree makes more sense to say non default and that keeps it consistent

@@ -0,0 +1,12 @@
Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Gets the cardinality of a Bloom filter - number of items that have been successfully added to a Bloom filter.
Returns the cardinality of a Bloom filter which is the number of items that have been successfully added to it.

1
127.0.0.1:6379> BF.CARD key
1
127.0.0.1:6379> BF.CARD missing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
127.0.0.1:6379> BF.CARD missing
127.0.0.1:6379> BF.CARD nonexistentkey

@@ -0,0 +1,19 @@
Determines if an item has been added to the bloom filter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Determines if an item has been added to the bloom filter.
Determines if an item has been added to the bloom filter previously.

@KarthikSubbarao
Copy link
Member

I only reviewed the Command Documentation.

I will need to review the remaining sections next

section to bloomfilter topic, cleaned up other bloomfilter topic
sections
Making changes based on review comments

Co-authored-by: KarthikSubbarao <[email protected]>
Signed-off-by: zackcam <[email protected]>
@@ -0,0 +1,14 @@
Adds a single item to a bloom filter. If the specified bloom filter does not exist, a bloom filter is created with the provided name with default properties.

To add multiple items to a bloom filter, you can use the `BF.MADD` or `BF.INSERT` commands.

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more grammatically correct to have the s as it it showing there are multiple commands that can do this not just one


We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.

As seen below, when trying to create a bloom filter with a capacity that cannot be achieved through scale outs (given the memory limits), the command is rejected. However, if the capacity can be achieved through scale out (even with the limits) then the creation of the bloom filter will succeed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As seen below, when trying to create a bloom filter with a capacity that cannot be achieved through scale outs (given the memory limits), the command is rejected. However, if the capacity can be achieved through scale out (even with the limits) then the creation of the bloom filter will succeed.
As seen below, when trying to create a bloom filter with a capacity that cannot be achieved through scale outs (given the memory limits), the command is rejected. However, if the capacity can be achieved through scale out (even with the limits), the creation of the bloom filter will succeed.


## Performance

The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.
The bloom commands which involve adding items or checking the existence of items have a time complexity of O(N * K) where N is the number of hash functions used by the bloom filter and K is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(N) as they only operate on one item.

* EXPANSION *expansion* - This option will specify the bloom filter as scaling and controls the size of the sub filter that will be created upon scale out / expansion of the bloom filter.
* NOCREATE - Will not create the bloom filter and add items if the filter does not exist already.
* TIGHTENING *tightening_ratio* - The tightening ratio for the bloom filter.
* SEED *seed* - The seed the hash functions will use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* SEED *seed* - The seed the hash functions will use.
* SEED *seed* - The 32 byte seed the bloom filter's hash functions will use.

[]
```

We can use the `BF.INFO` command's `MAXSCALEDCAPACITY` field to find out the maximum capacity that the scalable bloom filter can expand to hold.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can use the `BF.INFO` command's `MAXSCALEDCAPACITY` field to find out the maximum capacity that the scalable bloom filter can expand to hold.
The `BF.INFO` command's `MAXSCALEDCAPACITY` field can be used to find out the maximum capacity that the scalable bloom filter can expand to hold.


Bloom filters can be used to answer the question, "Has this card been flagged as stolen?". To do this, use a bloom filter that contains cards reported as stolen. When a card is used, check whether it is present in the bloom filter. If the card is not found, it means it is not marked as stolen. If the card is present in the filter, a check can be made against the main database, or the purchase can be denied.

### Ad placement / Deduplication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming usecase

Suggested change
### Ad placement / Deduplication
### Advertisement / Campaign placement and deduplication

Also, let's make this the first section in the list of use cases. Fraud detection can be second.

Comment on lines 28 to 35
Bloom filters can help advertisers answer the following questions:
* Has the user already seen this ad?
* Has the user already purchased this product?

For each user, use a Bloom filter to store all the products they have purchased. The recommendation engine can then suggest a new product and check if it is present in the user's Bloom filter.

* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.
* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Bloom filters can help advertisers answer the following questions:
* Has the user already seen this ad?
* Has the user already purchased this product?
For each user, use a Bloom filter to store all the products they have purchased. The recommendation engine can then suggest a new product and check if it is present in the user's Bloom filter.
* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.
* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.
Bloom filters can help e-commerce sites, streaming services, advertising networks, or marketing platforms answer the following questions:
* Has an advertisement already been shown to a user?
* Has a promotional email or notification already been sent to a user?
* Has a product already been purchased by a user?
Example: For each user, use a Bloom filter to store all the products they have purchased. The recommendation engine can then suggest a new product and check if it is present in the user's Bloom filter.
* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.
* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.

* If the product is not in the filter, the ad is shown to the user, and the product is added to the filter.
* If the product is already in the filter, it means the ad has already been shown to the user and the recommendation engine finds a different ad to show.

### Check if URL's are malicious
Copy link
Member

@KarthikSubbarao KarthikSubbarao Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above this, we could add a use case for "Filtering out Spam/Harmful Content". Or you can combine both by making this title generic and listing both usecases below


The bloom commands which involve adding items or checking the existence of items have a time complexity of O(n * k) where n is the number of hash functions used by the bloom filter and k is the number of elements being inserted. This means that both BF.ADD and BF.EXISTS are both O(n) as they only operate on one item.

Since performance relies on the number of hash functions, choosing the correct capacity and expansion rate can be important. In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity after evaluating the use case / workload to avoid several scale outs and reduce the number of checks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since performance relies on the number of hash functions, choosing the correct capacity and expansion rate can be important. In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity after evaluating the use case / workload to avoid several scale outs and reduce the number of checks.
In case of scalable bloom filters, with every scale out, we increase the number of checks (using hash functions of each sub filter) performed during any add / exists operation. For this reason, it is recommended that users choose a capacity and expansion rate after evaluating the use case / workload to avoid several scale outs and reduce the number of checks.

@zackcam zackcam force-pushed the main branch 2 times, most recently from 432ed3d to 4be2099 Compare March 31, 2025 18:35
Copy link
Member

@KarthikSubbarao KarthikSubbarao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes and details @zackcam .

Approved

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit picks.

Co-authored-by: Harkrishn Patro <[email protected]>
Signed-off-by: zackcam <[email protected]>
@hpatro
Copy link
Collaborator

hpatro commented Apr 1, 2025

I don't have write permissions to this repository, @zuiderkwast or @madolson could one of you help review and close this out? Once this is in, we could get the website PR closed and verify the changes and do the same activity for the JSON changes. Thanks.


```
127.0.0.1:6379> BF.ADD key val
1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a string or an integer response? This implies simple string, which seems odd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an integer copying the response I got over now

1
127.0.0.1:6379> BF.CARD nonexistentkey
0
``` No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a trailing new line

```
```
127.0.0.1:6379> BF.INSERT key NOCREATE ITEMS item1 item2
(error) ERR not found
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a very good error, is it too late to make it better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is from existing API and error messages from existing client libraries that support bloom filters
We followed the existing error messages to be API compatible with the bloom filter commands of existing client libraries


These are the default bloom properties along with the commands and configs which allow customizing.

<table width="100%" border="1" style="border-collapse: collapse; border: 1px solid black" cellpadding="8">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be wonky when ever we end up putting it in a man page. I think we should keep it as vanilla markdown.


* `bf_bloom_defrag_misses`: Total number of defrag misses that have occurred on bloom filters.

## Limits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are more of configs as opposed to limits.

Copy link
Member

@KarthikSubbarao KarthikSubbarao Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this document, since we already do list the configs in a section above, how about naming this section as "Large Bloom Filters"? Or "Handling Large Bloom Filters"?

When a bloom filter scales out, a new sub filter is added. The limit on the number of sub filters depends on the false positive rate and tightening ratio. Each sub filter has a stricter false positive, and this is controlled by the tightening ratio. If a command attempting a scale out results in the sub filter reaching a false positive of 0, the command is rejected.


We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We have implemented `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.
You can use `VALIDATESCALETO` as an optional arg of `BF.INSERT` to help determine whether the bloom filter can scale out to the reach the specified capacity without hitting either limits mentioned above. It will reject the command otherwise.


## Handling Large Bloom Filters

There are two limits a bloom filter faces.
Copy link
Member

@KarthikSubbarao KarthikSubbarao Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are two limits a bloom filter faces.
There are two notable validations bloom filters face.


1. Memory Usage Limit:

The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected.
The memory usage limit per bloom filter by default is defined by the `BF.BLOOM-MEMORY-USAGE-LIMIT` module configuration which has a default value of 128 MB. If a command results in a creation / scale out causing the overall memory usage to exceed this limit, the command is rejected. This config is modifiable and can be increased as needed.


There are two limits a bloom filter faces.

1. Memory Usage Limit:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Memory Usage Limit:
1. Memory Usage:

madolson pushed a commit to valkey-io/valkey-io.github.io that referenced this pull request Apr 2, 2025
…yed on the Valkey website (#212)

Related PR's
Bloom repo json command files:
valkey-io/valkey-bloom#47
Valkey-doc repo: valkey-io/valkey-doc#233

### Description
This PR will allow set the framework so that modules can have their
commands displayed on the valkey website (By adding the bloom module
commands in a way that can be easily expanded on). I have tried to make
this future proof by using a for loop on the `commands.html` page which
can be expanded by just adding any new folders we want to pull commands
from. For the `command-page.html` I have used an array to hold the data
from the multiple folders with commands and then get the first
occurrence that isn't empty (i.e the command belongs to that folder).
This will keep ability so that if the command doesn't exist we still
have the same fallback.

Updated the `init-commands.sh` to create a link for the bloom commands
as well and take in the bloom repository.

I have updated the README as well to include the new repo that will be
needed for the commands and the information change associated with now
expecting commands from the bloom repo.

Lastly updated the github workflow as well to also now build and take in
the bloom repo

**For screenshots of the new documentation the two pr's above
(valkey-io/valkey-doc#233 and
valkey-io/valkey-bloom#47) have screenshots of
all sections being added**

### Check List
- [x] Commits are signed per the DCO using `--signoff`

By submitting this pull request, I confirm that my contribution is made
under the terms of the BSD-3-Clause License.

Signed-off-by: zackcam <[email protected]>
@madolson madolson merged commit 9c13637 into valkey-io:main Apr 2, 2025
2 checks passed
* TIGHTENING *tightening_ratio* - The tightening ratio for the bloom filter.
* SEED *seed* - The 32 byte seed the bloom filter's hash functions will use.
* NONSCALING - This option will configure the bloom filter as non scaling; it cannot expand / scale beyond its specified capacity.
* VALIDATESCALETO *validatescaleto* - Validates if the filter can scale out and reach to this capacity based on limits and if not, return an error without creating the bloom filter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why you had to add validatescaleto to the spellcheck wordlist.

The idea is that keywords like this should be put within backticks. Then it doesn't need to be in the wordlist.

It's possible to combine italics + backticks if needed, for example:

`VALIDATESCALETO` *`validatescaleto`*

Rendered as

VALIDATESCALETO validatescaleto

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants