-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Add SanitizationStrategy #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Created SanitizationStrategy ABC with three implementations: - NoOpSanitizationStrategy: pass-through (no sanitization) - HashLongKeysSanitizationStrategy: SHA256 hashing with H_ prefix - CharacterSanitizationStrategy: character replacement with S_ prefix - Implemented SanitizationMigrationWrapper for gradual strategy migration: - LRU cache to avoid repeated fallback lookups - Collision detection (skips migration if strategies produce same key) - Lazy migration with optional delete after migration - Configurable cache size (default 1000 entries) - Added InvalidKeyError for validation failures - Comprehensive test coverage: - 28 tests for sanitization strategies - 18 tests for migration wrapper - All edge cases covered Addresses user concerns about: 1. Key collision prevention via reserved prefixes (H_ and S_) 2. Performance impact via LRU caching of lookup results 3. Handling identical keys from both strategies (no unnecessary migration) Resolves #207 Co-authored-by: William Easton <[email protected]>
📝 WalkthroughWalkthroughAdds a sanitization framework and InvalidKeyError; exposes per-key and per-collection sanitization strategies in async and sync BaseStore constructors; wires strategies into multiple concrete stores (Elasticsearch, MongoDB, Keyring, Memcached, Windows Registry); refactors Elasticsearch serializer usage; updates tests, wrapper imports, removes a simple-store constant, and adds a pyright config flag. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BaseStore
participant CollSan as CollectionSanitizer
participant KeySan as KeySanitizer
participant StoreImpl
participant Backend
Note over Client,BaseStore: strategies injected at init (default PassthroughStrategy)
Client->>BaseStore: create(..., key_sanitization_strategy?, collection_sanitization_strategy?)
BaseStore->>BaseStore: store strategies
Note over Client,BaseStore: Put operation (high level)
Client->>BaseStore: put(collection, key, value)
BaseStore->>CollSan: _sanitize_collection(collection)
CollSan-->>BaseStore: sanitized_collection
BaseStore->>KeySan: _sanitize_key(key)
KeySan-->>BaseStore: sanitized_key
BaseStore->>StoreImpl: _put_managed_entry(sanitized_collection, sanitized_key, value)
StoreImpl->>Backend: persist(serialized_value)
Backend-->>StoreImpl: ack
StoreImpl-->>BaseStore: success
BaseStore-->>Client: success
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (4)
🧰 Additional context used🧬 Code graph analysis (4)key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py (1)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py (2)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
🔇 Additional comments (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (13)
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/__init__.py(1 hunks)key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py(1 hunks)key-value/key-value-aio/tests/wrappers/sanitization_migration/__init__.py(1 hunks)key-value/key-value-aio/tests/wrappers/sanitization_migration/test_wrapper.py(1 hunks)key-value/key-value-shared/src/key_value/shared/errors/__init__.py(2 hunks)key-value/key-value-shared/src/key_value/shared/errors/key_value.py(1 hunks)key-value/key-value-shared/src/key_value/shared/utils/sanitization_strategy.py(1 hunks)key-value/key-value-shared/tests/utils/test_sanitization_strategy.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/__init__.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py(1 hunks)key-value/key-value-sync/src/key_value/sync/wrappers/sanitization_migration/__init__.py(1 hunks)key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/__init__.py(1 hunks)key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/test_wrapper.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/__init__.py (2)
key-value/key-value-aio/tests/wrappers/sanitization_migration/test_wrapper.py (1)
wrapper(23-30)key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py (1)
SanitizationMigrationWrapper(22-474)
key-value/key-value-sync/src/key_value/sync/wrappers/sanitization_migration/__init__.py (1)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py (1)
SanitizationMigrationWrapper(25-472)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/__init__.py (3)
key-value/key-value-aio/tests/wrappers/sanitization_migration/test_wrapper.py (1)
wrapper(23-30)key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/test_wrapper.py (1)
wrapper(26-28)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py (1)
SanitizationMigrationWrapper(25-472)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py (2)
key-value/key-value-sync/src/key_value/sync/code_gen/protocols/key_value.py (1)
KeyValue(178-183)key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py (12)
SanitizationMigrationWrapper(22-474)_get_cache_key(115-117)_cache_get(119-123)get(131-183)_cache_put(125-128)ttl(272-313)put(396-409)delete(435-452)get_many(186-269)ttl_many(316-393)put_many(412-432)delete_many(455-474)
key-value/key-value-shared/tests/utils/test_sanitization_strategy.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitization_strategy.py (14)
CharacterSanitizationStrategy(150-246)HashFragmentMode(19-29)HashLongKeysSanitizationStrategy(99-147)NoOpSanitizationStrategy(80-96)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-233)validate(55-63)validate(91-92)validate(136-147)validate(235-246)try_unsanitize(65-77)try_unsanitize(94-96)
key-value/key-value-aio/tests/wrappers/sanitization_migration/test_wrapper.py (1)
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py (10)
SanitizationMigrationWrapper(22-474)put(396-409)get(131-183)ttl(272-313)_cache_get(119-123)delete(435-452)get_many(186-269)put_many(412-432)delete_many(455-474)ttl_many(316-393)
key-value/key-value-shared/src/key_value/shared/errors/__init__.py (1)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)
key-value/key-value-shared/src/key_value/shared/utils/sanitization_strategy.py (1)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)
key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/test_wrapper.py (1)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py (10)
SanitizationMigrationWrapper(25-472)put(399-412)get(134-186)ttl(275-316)_cache_get(122-126)delete(433-450)get_many(189-272)put_many(415-430)delete_many(453-472)ttl_many(319-396)
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py (2)
key-value/key-value-aio/src/key_value/aio/protocols/key_value.py (1)
AsyncKeyValue(185-190)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py (12)
SanitizationMigrationWrapper(25-472)_get_cache_key(118-120)_cache_get(122-126)get(134-186)_cache_put(128-131)ttl(275-316)put(399-412)delete(433-450)get_many(189-272)ttl_many(319-396)put_many(415-430)delete_many(453-472)
🪛 GitHub Actions: Run Tests
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py
[error] 208-208: pyright: Type of 'append' is partially unknown. Type is '(object: Unknown, /) -> None' (reportUnknownMemberType).
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py
[error] 205-205: Type of "append" is partially unknown. Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType).
🪛 GitHub Check: static_analysis (key-value/key-value-aio)
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py
[failure] 225-225:
Argument type is partially unknown
Argument corresponds to parameter "iterable" in function "new"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 225-225:
Type of "key" is unknown (reportUnknownVariableType)
[failure] 224-224:
Argument type is partially unknown
Argument corresponds to parameter "keys" in function "get_many"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 219-219:
Argument type is partially unknown
Argument corresponds to parameter "iterable" in function "new"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 219-219:
Type of "key" is unknown (reportUnknownVariableType)
[failure] 218-218:
Argument type is partially unknown
Argument corresponds to parameter "keys" in function "get_many"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 211-211:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 209-209:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 207-207:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 205-205:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
🪛 GitHub Check: static_analysis (key-value/key-value-sync)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py
[failure] 228-228:
Argument type is partially unknown
Argument corresponds to parameter "iterable" in function "new"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 228-228:
Type of "key" is unknown (reportUnknownVariableType)
[failure] 227-227:
Argument type is partially unknown
Argument corresponds to parameter "keys" in function "get_many"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 222-222:
Argument type is partially unknown
Argument corresponds to parameter "iterable" in function "new"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 222-222:
Type of "key" is unknown (reportUnknownVariableType)
[failure] 221-221:
Argument type is partially unknown
Argument corresponds to parameter "keys" in function "get_many"
Argument type is "list[Unknown]" (reportUnknownArgumentType)
[failure] 214-214:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 212-212:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 210-210:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 208-208:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
🪛 Gitleaks (8.28.0)
key-value/key-value-shared/tests/utils/test_sanitization_strategy.py
[high] 254-254: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (11)
key-value/key-value-shared/src/key_value/shared/errors/__init__.py (1)
1-21: LGTM! Clean addition to error exports.The new
InvalidKeyErrorexport follows the existing pattern and is properly alphabetized in both the import statement and__all__list.key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
37-39: LGTM! Well-designed exception class.The
InvalidKeyErrorfollows the established pattern and provides clear documentation about its use case for reserved prefix validation.key-value/key-value-aio/tests/wrappers/sanitization_migration/__init__.py (1)
1-1: LGTM! Standard test package initializer.key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/__init__.py (1)
1-4: LGTM! Auto-generated test package initializer.Properly marked with codegen warnings.
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/__init__.py (1)
1-12: LGTM! Well-structured package initializer.The auto-generated wrapper properly re-exports
SanitizationMigrationWrapperwith clear documentation about its role in gradual strategy migration.key-value/key-value-shared/tests/utils/test_sanitization_strategy.py (4)
14-36: LGTM! Thorough testing of NoOp strategy.Tests correctly verify that
NoOpSanitizationStrategyacts as a pass-through, including acceptance of reserved prefixes (which is appropriate for a no-op strategy).
38-106: LGTM! Comprehensive hash strategy testing.Excellent coverage of
HashLongKeysSanitizationStrategyincluding deterministic behavior, collision avoidance, reserved prefix validation, and custom parameters.
108-233: LGTM! Excellent character sanitization test coverage.Comprehensive testing of
CharacterSanitizationStrategyincluding allHashFragmentModevariants, custom parameters, truncation logic, and deterministic behavior. Well-structured tests.
235-268: LGTM! Good edge case coverage, and static analysis warning is a false positive.These tests properly validate collision prevention (reserved prefixes) and empty string handling across all strategies.
The Gitleaks warning on line 254 flagging
"S_my_key-12345678"as a "Generic API Key" is a false positive — this is a test string intentionally crafted to verify that user-provided keys starting with the reservedS_prefix are correctly rejected.key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/__init__.py (1)
1-9: LGTM! Clean package initializer for async wrapper.Properly re-exports
SanitizationMigrationWrapperwith clear documentation. Structure is consistent with the sync variant.key-value/key-value-sync/src/key_value/sync/wrappers/sanitization_migration/__init__.py (1)
1-12: LGTM! Auto-generated sync wrapper initializer.Properly structured to re-export the code-generated
SanitizationMigrationWrapperfrom the sync path. Consistent with the overall codegen pattern.
| current_keys = [] | ||
| legacy_keys = [] | ||
| missing_keys = [] | ||
| unknown_keys = [] | ||
|
|
||
| for key in keys: | ||
| cached_location = self._cache_get(key, collection) | ||
| if cached_location == "current": | ||
| current_keys.append(key) | ||
| elif cached_location == "legacy": | ||
| legacy_keys.append(key) | ||
| elif cached_location == "missing": | ||
| missing_keys.append(key) | ||
| else: | ||
| unknown_keys.append(key) | ||
|
|
||
| # Start with all None | ||
| key_to_value: dict[str, dict[str, Any] | None] = dict.fromkeys(keys, None) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Annotate key-partition lists to unblock pyright.
Pyright is failing in CI (reportUnknownMemberType on append) because the partition lists (current_keys, legacy_keys, etc.) are inferred as list[Unknown]. Please give them concrete element types (and update unknown_keys/missing_keys similarly, plus the dict comprehension) so callers like get_many/ttl_many continue to type-check. Example fix:
- current_keys = []
- legacy_keys = []
- missing_keys = []
- unknown_keys = []
+ current_keys: list[str] = []
+ legacy_keys: list[str] = []
+ missing_keys: list[str] = []
+ unknown_keys: list[str] = []
- key_to_value: dict[str, dict[str, Any] | None] = dict.fromkeys(keys, None)
+ key_to_value: dict[str, dict[str, Any] | None] = {key: None for key in keys}and likewise in ttl_many. This will clear the failing static-analysis job.
Also applies to: 332-344
🏁 Script executed:
#!/bin/bash
# Locate the wrapper.py file
fd -t f -p "wrapper.py" | grep -i sanitization_migrationLength of output: 346
🏁 Script executed:
#!/bin/bash
# If found, inspect the specific lines mentioned in the review
fd -t f -p "wrapper.py" | grep -i sanitization_migration | head -1 | xargs -I {} sh -c 'echo "=== Lines 197-215 ===" && sed -n "197,215p" {} && echo -e "\n=== Lines 332-344 ===" && sed -n "332,344p" {}'Length of output: 1210
🏁 Script executed:
#!/bin/bash
# Get broader context around the functions to understand parameter types
fd -t f -p "wrapper.py" | grep -i sanitization_migration | head -1 | xargs -I {} sh -c 'echo "=== Function containing lines 197-215 ===" && sed -n "180,230p" {} && echo -e "\n=== Function containing lines 332-344 ===" && sed -n "315,360p" {}'Length of output: 3983
🏁 Script executed:
#!/bin/bash
# Check if sync version has the same pattern (at different lines possibly)
echo "=== Checking sync wrapper.py for same pattern ===" && fd -t f "wrapper.py" | grep "key-value-sync.*sanitization_migration" | head -1 | xargs -I {} sh -c 'grep -n "current_keys = \[\]" {} && echo "Found pattern in sync version"'Length of output: 139
🏁 Script executed:
#!/bin/bash
# Check for any pyrightconfig.json or pyproject.toml with type checking settings
fd -t f \( -name "pyrightconfig.json" -o -name "pyproject.toml" \) | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && cat {}'Length of output: 230
🏁 Script executed:
#!/bin/bash
# Look for CI workflow files that might show Pyright errors
fd -t f -path "*/.github/workflows/*" -name "*.yml" -o -name "*.yaml" | xargs grep -l "pyright\|type.*check" 2>/dev/null | head -3Length of output: 168
Annotate key-partition lists to unblock pyright.
Pyright fails in CI (reportUnknownMemberType on append) because the partition lists (current_keys, legacy_keys, missing_keys, unknown_keys) are inferred as list[Unknown]. Add concrete list[str] annotations to all four lists and replace dict.fromkeys(keys, None) with a dict comprehension for cleaner type inference:
- current_keys = []
- legacy_keys = []
- missing_keys = []
- unknown_keys = []
+ current_keys: list[str] = []
+ legacy_keys: list[str] = []
+ missing_keys: list[str] = []
+ unknown_keys: list[str] = []
- key_to_value: dict[str, dict[str, Any] | None] = dict.fromkeys(keys, None)
+ key_to_value: dict[str, dict[str, Any] | None] = {key: None for key in keys}Apply the same fix to ttl_many at lines 332–344 (same list partition pattern, plus replace the existing dict.fromkeys(keys, (None, None)) with the comprehension equivalent).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| current_keys = [] | |
| legacy_keys = [] | |
| missing_keys = [] | |
| unknown_keys = [] | |
| for key in keys: | |
| cached_location = self._cache_get(key, collection) | |
| if cached_location == "current": | |
| current_keys.append(key) | |
| elif cached_location == "legacy": | |
| legacy_keys.append(key) | |
| elif cached_location == "missing": | |
| missing_keys.append(key) | |
| else: | |
| unknown_keys.append(key) | |
| # Start with all None | |
| key_to_value: dict[str, dict[str, Any] | None] = dict.fromkeys(keys, None) | |
| current_keys: list[str] = [] | |
| legacy_keys: list[str] = [] | |
| missing_keys: list[str] = [] | |
| unknown_keys: list[str] = [] | |
| for key in keys: | |
| cached_location = self._cache_get(key, collection) | |
| if cached_location == "current": | |
| current_keys.append(key) | |
| elif cached_location == "legacy": | |
| legacy_keys.append(key) | |
| elif cached_location == "missing": | |
| missing_keys.append(key) | |
| else: | |
| unknown_keys.append(key) | |
| # Start with all None | |
| key_to_value: dict[str, dict[str, Any] | None] = {key: None for key in keys} |
🧰 Tools
🪛 GitHub Actions: Run Tests
[error] 205-205: Type of "append" is partially unknown. Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType).
🪛 GitHub Check: static_analysis (key-value/key-value-aio)
[failure] 211-211:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 209-209:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 207-207:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
[failure] 205-205:
Type of "append" is partially unknown
Type of "append" is "(object: Unknown, /) -> None" (reportUnknownMemberType)
🤖 Prompt for AI Agents
In
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py
around lines 197–215 and 332–344, the partition lists (current_keys,
legacy_keys, missing_keys, unknown_keys) are inferred as list[Unknown] causing
pyright failures; explicitly type them as list[str] (e.g., current_keys:
list[str] = []) and replace usages of dict.fromkeys(keys, None) and
dict.fromkeys(keys, (None, None)) with equivalent dict comprehensions (e.g., {k:
None for k in keys} and {k: (None, None) for k in keys}) to give concrete types
for key_to_value and ttl maps so pyright can infer correct member types.
| if add_hash: | ||
| # Generate hash fragment | ||
| hash_fragment = hashlib.sha256(value.encode()).hexdigest()[: self.hash_fragment_length] | ||
|
|
||
| # Calculate space needed for 'S_' prefix + '-' separator + hash | ||
| overhead = 2 + 1 + self.hash_fragment_length # 'S_' + '-' + hash | ||
|
|
||
| # Truncate sanitized value to fit within max_length | ||
| max_value_length = self.max_length - overhead | ||
| if len(sanitized) > max_value_length: | ||
| sanitized = sanitized[:max_value_length] | ||
|
|
||
| # Add prefix, hash fragment, and return | ||
| return f"S_{sanitized}-{hash_fragment}" | ||
|
|
||
| # No hash needed - return sanitized value (or original if unchanged) | ||
| return sanitized | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix max-length & prefix guarantees when skipping the hash.
When add_hash resolves to False (e.g. HashFragmentMode.NEVER or an already-compliant string), we currently fall straight through to return sanitized. Two correctness gaps result:
- Keys longer than
max_lengthare never truncated, so the sanitized output can still violate the backend’s length constraint. - If sanitization actually modified the string (invalid characters replaced or length trimmed) we skip the reserved
S_prefix, reintroducing collisions with user keys (the validation layer only blocks user input starting withH_/S_).
Please enforce both guarantees before returning, e.g.:
- # No hash needed - return sanitized value (or original if unchanged)
- return sanitized
+ needs_truncation = len(sanitized) > self.max_length
+ if needs_truncation:
+ sanitized = sanitized[: self.max_length]
+ changed = True
+
+ if changed:
+ prefixed = f"S_{sanitized}"
+ sanitized = prefixed[: self.max_length] if len(prefixed) > self.max_length else prefixed
+
+ return sanitizedThis keeps every output within max_length and preserves the reserved-prefix guarantee whenever we diverge from the user-provided key.
🤖 Prompt for AI Agents
In
key-value/key-value-shared/src/key_value/shared/utils/sanitization_strategy.py
around lines 217-234, when add_hash is False we currently return sanitized
directly which can violate max_length and can drop the reserved "S_" prefix when
sanitization changed the input; fix by enforcing truncation and prefix
guarantees: compute max_value_length = self.max_length (or self.max_length - 2
when you must add 'S_'), if len(sanitized) > max_value_length truncate sanitized
to that length, and if sanitized != original value ensure the result begins with
'S_' (avoid double-adding if it already starts with 'S_') — if adding the 'S_'
would overflow, trim sanitized to self.max_length - 2 then prepend 'S_'; finally
return the possibly-truncated, prefixed sanitized string so outputs always fit
max_length and preserve the reserved prefix when modified.
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/wrapper.py
Outdated
Show resolved
Hide resolved
|
@claude i wonder if we should actually just implement a generic |
|
Claude finished @strawgate's task —— View job Analysis & Fixes Complete ✅I've analyzed the PR and addressed all the CodeRabbit feedback. Here's what was done: Analysis: Generic MigrateStoreWrapperI analyzed whether to create a generic Recommendation: Keep it specific for now The codebase has a Rationale:
Future path: If a second migration use case emerges (e.g., Fixes Applied✅ 1. Fixed type annotation issues
✅ 2. Fixed max-length/prefix logic in
✅ 3. Regenerated sync code
✅ 4. Lint & format passed
✅ 5. Type checking
✅ 6. Tests
✅ 7. Committed & pushed
SummaryAll CodeRabbit feedback has been addressed:
The PR is now ready for re-review! | |
- Add explicit list[str] type annotations to fix pyright errors - Fix max-length/prefix enforcement in CharacterSanitizationStrategy - Add truncation when add_hash is False - Ensure S_ prefix when sanitization changes input - Use local variables to narrow types for None checks - Add noqa for "too many statements" warning Co-authored-by: William Easton <[email protected]>
|
@claude static tests are failing and i was just saying the migration wrapper is probably already generic and could actually just be our MigrateStoreWrapper? |
|
Claude finished @strawgate's task —— View job Static Tests Fixed ✅SummaryI've analyzed the PR and fixed the failing static tests. Here's what was done: Issue DiagnosedThe CI was failing because test files accessed protected members ( Fixes Applied✅ Added pyright ignore comments to 5 test lines that access protected members
This follows the standard pattern used throughout the codebase for testing internal behavior (see examples in ✅ Regenerated sync code via ✅ Verified with typecheck and lint
Architectural Analysis: Should This Be Generic?You asked whether The wrapper IS already generic! It doesn't actually depend on sanitization strategies at all - it's a general-purpose dual-store migration tool that works with any two My recommendation: Keep the current name because:
If another migration use case emerges (e.g., compression migration, encryption migration), then refactor to extract a The key difference from
Changes CommittedCommit: Files changed:
Pushed to branch: Next StepsThe PR should now pass static analysis! The CI will run again and verify that all type checking passes (except for pre-existing Redis store issues that are unrelated to this PR). --- | |
Tests need to access protected members (_cache_get, _cache_enabled) to verify internal cache behavior. Added standard pyright ignore comments to suppress reportPrivateUsage warnings in test files. Co-authored-by: William Easton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/tests/wrappers/sanitization_migration/test_wrapper.py(1 hunks)key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/test_wrapper.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
key-value/key-value-aio/tests/wrappers/sanitization_migration/test_wrapper.py (1)
key-value/key-value-aio/src/key_value/aio/wrappers/sanitization_migration/wrapper.py (10)
SanitizationMigrationWrapper(22-476)put(398-411)get(131-183)ttl(274-315)_cache_get(119-123)delete(437-454)get_many(186-271)put_many(414-434)delete_many(457-476)ttl_many(318-395)
key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/test_wrapper.py (1)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/sanitization_migration/wrapper.py (10)
SanitizationMigrationWrapper(25-474)put(401-414)get(134-186)ttl(277-318)_cache_get(122-126)delete(435-452)get_many(189-274)put_many(417-432)delete_many(455-474)ttl_many(321-398)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
| import pytest | ||
|
|
||
| from key_value.aio.stores.memory import MemoryStore | ||
| from key_value.aio.wrappers.sanitization_migration import SanitizationMigrationWrapper | ||
|
|
||
|
|
||
| class TestSanitizationMigrationWrapper: | ||
| """Tests for SanitizationMigrationWrapper.""" | ||
|
|
||
| @pytest.fixture | ||
| def current_store(self) -> MemoryStore: | ||
| """Create a current store (new strategy).""" | ||
| return MemoryStore() | ||
|
|
||
| @pytest.fixture | ||
| def legacy_store(self) -> MemoryStore: | ||
| """Create a legacy store (old strategy).""" | ||
| return MemoryStore() | ||
|
|
||
| @pytest.fixture | ||
| def wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | ||
| """Create a migration wrapper.""" | ||
| return SanitizationMigrationWrapper( | ||
| current_store=current_store, | ||
| legacy_store=legacy_store, | ||
| migrate_on_read=False, | ||
| cache_size=100, | ||
| ) | ||
|
|
||
| @pytest.fixture | ||
| def migrating_wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | ||
| """Create a migration wrapper with migrate_on_read=True.""" | ||
| return SanitizationMigrationWrapper( | ||
| current_store=current_store, | ||
| legacy_store=legacy_store, | ||
| migrate_on_read=True, | ||
| delete_after_migration=False, | ||
| cache_size=100, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add pytest-asyncio marker so async tests actually run
pytest-asyncio now defaults to strict mode; async tests without @pytest.mark.asyncio (or a module-level pytestmark) won’t execute and will fail collection unless the suite explicitly opts into auto mode. Please mark this module so the new tests run everywhere.(pytest-asyncio.readthedocs.io)
import pytest
+pytestmark = pytest.mark.asyncio
+
from key_value.aio.stores.memory import MemoryStore📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import pytest | |
| from key_value.aio.stores.memory import MemoryStore | |
| from key_value.aio.wrappers.sanitization_migration import SanitizationMigrationWrapper | |
| class TestSanitizationMigrationWrapper: | |
| """Tests for SanitizationMigrationWrapper.""" | |
| @pytest.fixture | |
| def current_store(self) -> MemoryStore: | |
| """Create a current store (new strategy).""" | |
| return MemoryStore() | |
| @pytest.fixture | |
| def legacy_store(self) -> MemoryStore: | |
| """Create a legacy store (old strategy).""" | |
| return MemoryStore() | |
| @pytest.fixture | |
| def wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | |
| """Create a migration wrapper.""" | |
| return SanitizationMigrationWrapper( | |
| current_store=current_store, | |
| legacy_store=legacy_store, | |
| migrate_on_read=False, | |
| cache_size=100, | |
| ) | |
| @pytest.fixture | |
| def migrating_wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | |
| """Create a migration wrapper with migrate_on_read=True.""" | |
| return SanitizationMigrationWrapper( | |
| current_store=current_store, | |
| legacy_store=legacy_store, | |
| migrate_on_read=True, | |
| delete_after_migration=False, | |
| cache_size=100, | |
| ) | |
| import pytest | |
| pytestmark = pytest.mark.asyncio | |
| from key_value.aio.stores.memory import MemoryStore | |
| from key_value.aio.wrappers.sanitization_migration import SanitizationMigrationWrapper | |
| class TestSanitizationMigrationWrapper: | |
| """Tests for SanitizationMigrationWrapper.""" | |
| @pytest.fixture | |
| def current_store(self) -> MemoryStore: | |
| """Create a current store (new strategy).""" | |
| return MemoryStore() | |
| @pytest.fixture | |
| def legacy_store(self) -> MemoryStore: | |
| """Create a legacy store (old strategy).""" | |
| return MemoryStore() | |
| @pytest.fixture | |
| def wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | |
| """Create a migration wrapper.""" | |
| return SanitizationMigrationWrapper( | |
| current_store=current_store, | |
| legacy_store=legacy_store, | |
| migrate_on_read=False, | |
| cache_size=100, | |
| ) | |
| @pytest.fixture | |
| def migrating_wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | |
| """Create a migration wrapper with migrate_on_read=True.""" | |
| return SanitizationMigrationWrapper( | |
| current_store=current_store, | |
| legacy_store=legacy_store, | |
| migrate_on_read=True, | |
| delete_after_migration=False, | |
| cache_size=100, | |
| ) |
🤖 Prompt for AI Agents
In key-value/key-value-aio/tests/wrappers/sanitization_migration/test_wrapper.py
around lines 3 to 42, the file contains async tests but lacks pytest-asyncio
markers so they won’t run in strict mode; add a module-level pytest-asyncio
marker or decorate the async test functions. Concretely, import pytest if not
already and add a module-level marker (e.g., set pytestmark =
pytest.mark.asyncio) at the top of the file, or apply @pytest.mark.asyncio to
each async test so pytest-asyncio runs them.
| # WARNING: this file is auto-generated by 'build_sync_library.py' | ||
| # from the original file 'test_wrapper.py' | ||
| # DO NOT CHANGE! Change the original file instead. | ||
| """Tests for SanitizationMigrationWrapper.""" | ||
|
|
||
| import pytest | ||
|
|
||
| from key_value.sync.code_gen.stores.memory import MemoryStore | ||
| from key_value.sync.code_gen.wrappers.sanitization_migration import SanitizationMigrationWrapper | ||
|
|
||
|
|
||
| class TestSanitizationMigrationWrapper: | ||
| """Tests for SanitizationMigrationWrapper.""" | ||
|
|
||
| @pytest.fixture | ||
| def current_store(self) -> MemoryStore: | ||
| """Create a current store (new strategy).""" | ||
| return MemoryStore() | ||
|
|
||
| @pytest.fixture | ||
| def legacy_store(self) -> MemoryStore: | ||
| """Create a legacy store (old strategy).""" | ||
| return MemoryStore() | ||
|
|
||
| @pytest.fixture | ||
| def wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | ||
| """Create a migration wrapper.""" | ||
| return SanitizationMigrationWrapper(current_store=current_store, legacy_store=legacy_store, migrate_on_read=False, cache_size=100) | ||
|
|
||
| @pytest.fixture | ||
| def migrating_wrapper(self, current_store: MemoryStore, legacy_store: MemoryStore) -> SanitizationMigrationWrapper: | ||
| """Create a migration wrapper with migrate_on_read=True.""" | ||
| return SanitizationMigrationWrapper( | ||
| current_store=current_store, legacy_store=legacy_store, migrate_on_read=True, delete_after_migration=False, cache_size=100 | ||
| ) | ||
|
|
||
| def test_get_from_current_store(self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore) -> None: | ||
| """Test getting a value from the current store.""" | ||
| current_store.put(key="test_key", value={"data": "current"}, collection="default") | ||
|
|
||
| result = wrapper.get(key="test_key", collection="default") | ||
| assert result is not None | ||
| assert result["data"] == "current" | ||
|
|
||
| def test_get_from_legacy_store( | ||
| self, wrapper: SanitizationMigrationWrapper, legacy_store: MemoryStore, current_store: MemoryStore | ||
| ) -> None: | ||
| """Test getting a value from the legacy store when not in current.""" | ||
| legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default") | ||
|
|
||
| result = wrapper.get(key="test_key", collection="default") | ||
| assert result is not None | ||
| assert result["data"] == "legacy" | ||
|
|
||
| # Should NOT have been migrated (migrate_on_read=False) | ||
| current_result = current_store.get(key="test_key", collection="default") | ||
| assert current_result is None | ||
|
|
||
| def test_get_missing_key(self, wrapper: SanitizationMigrationWrapper) -> None: | ||
| """Test getting a missing key returns None.""" | ||
| result = wrapper.get(key="missing_key", collection="default") | ||
| assert result is None | ||
|
|
||
| def test_migrate_on_read( | ||
| self, migrating_wrapper: SanitizationMigrationWrapper, legacy_store: MemoryStore, current_store: MemoryStore | ||
| ) -> None: | ||
| """Test that migrate_on_read copies data from legacy to current.""" | ||
| legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default", ttl=3600) | ||
|
|
||
| result = migrating_wrapper.get(key="test_key", collection="default") | ||
| assert result is not None | ||
| assert result["data"] == "legacy" | ||
|
|
||
| # Should have been migrated | ||
| current_result = current_store.get(key="test_key", collection="default") | ||
| assert current_result is not None | ||
| assert current_result["data"] == "legacy" | ||
|
|
||
| # Legacy should still have it (delete_after_migration=False) | ||
| legacy_result = legacy_store.get(key="test_key", collection="default") | ||
| assert legacy_result is not None | ||
|
|
||
| def test_migrate_on_read_with_delete(self, legacy_store: MemoryStore, current_store: MemoryStore) -> None: | ||
| """Test that delete_after_migration removes from legacy.""" | ||
| wrapper = SanitizationMigrationWrapper( | ||
| current_store=current_store, legacy_store=legacy_store, migrate_on_read=True, delete_after_migration=True, cache_size=100 | ||
| ) | ||
|
|
||
| legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default") | ||
|
|
||
| result = wrapper.get(key="test_key", collection="default") | ||
| assert result is not None | ||
|
|
||
| # Should have been deleted from legacy | ||
| legacy_result = legacy_store.get(key="test_key", collection="default") | ||
| assert legacy_result is None | ||
|
|
||
| def test_cache_avoids_repeated_lookups( | ||
| self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore, legacy_store: MemoryStore | ||
| ) -> None: | ||
| """Test that cache avoids repeated lookups.""" | ||
| current_store.put(key="test_key", value={"data": "current"}, collection="default") | ||
|
|
||
| # First get - should cache | ||
| result1 = wrapper.get(key="test_key", collection="default") | ||
| assert result1 is not None | ||
|
|
||
| # Check cache | ||
| cached_location = wrapper._cache_get(key="test_key", collection="default") # pyright: ignore[reportPrivateUsage] | ||
| assert cached_location == "current" | ||
|
|
||
| # Second get - should use cache (wouldn't check legacy even if we added to it) | ||
| legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default") | ||
| result2 = wrapper.get(key="test_key", collection="default") | ||
| assert result2 is not None | ||
| assert result2["data"] == "current" # Still from current, not legacy | ||
|
|
||
| def test_cache_missing_keys(self, wrapper: SanitizationMigrationWrapper) -> None: | ||
| """Test that missing keys are cached.""" | ||
| # First get - should cache as missing | ||
| result1 = wrapper.get(key="missing_key", collection="default") | ||
| assert result1 is None | ||
|
|
||
| # Check cache | ||
| cached_location = wrapper._cache_get(key="missing_key", collection="default") # pyright: ignore[reportPrivateUsage] | ||
| assert cached_location == "missing" | ||
|
|
||
| # Second get - should use cache | ||
| result2 = wrapper.get(key="missing_key", collection="default") | ||
| assert result2 is None | ||
|
|
||
| def test_put_updates_cache(self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore, legacy_store: MemoryStore) -> None: | ||
| """Test that put updates the cache.""" | ||
| # Put initially in legacy | ||
| legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default") | ||
|
|
||
| # Get - should cache as legacy | ||
| wrapper.get(key="test_key", collection="default") | ||
| assert wrapper._cache_get(key="test_key", collection="default") == "legacy" # pyright: ignore[reportPrivateUsage] | ||
|
|
||
| # Put - should update cache to current | ||
| wrapper.put(key="test_key", value={"data": "new"}, collection="default") | ||
| assert wrapper._cache_get(key="test_key", collection="default") == "current" # pyright: ignore[reportPrivateUsage] | ||
|
|
||
| # Get - should get from current, not legacy | ||
| result = wrapper.get(key="test_key", collection="default") | ||
| assert result is not None | ||
| assert result["data"] == "new" | ||
|
|
||
| def test_delete_from_both_stores( | ||
| self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore, legacy_store: MemoryStore | ||
| ) -> None: | ||
| """Test that delete removes from both stores.""" | ||
| current_store.put(key="key1", value={"data": "current"}, collection="default") | ||
| legacy_store.put(key="key2", value={"data": "legacy"}, collection="default") | ||
|
|
||
| # Delete should remove from both | ||
| deleted1 = wrapper.delete(key="key1", collection="default") | ||
| deleted2 = wrapper.delete(key="key2", collection="default") | ||
|
|
||
| assert deleted1 is True | ||
| assert deleted2 is True | ||
|
|
||
| # Verify deletion | ||
| assert current_store.get(key="key1", collection="default") is None | ||
| assert legacy_store.get(key="key2", collection="default") is None | ||
|
|
||
| def test_get_many(self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore, legacy_store: MemoryStore) -> None: | ||
| """Test get_many with keys in different stores.""" | ||
| current_store.put(key="key1", value={"data": "current1"}, collection="default") | ||
| legacy_store.put(key="key2", value={"data": "legacy2"}, collection="default") | ||
|
|
||
| results = wrapper.get_many(keys=["key1", "key2", "key3"], collection="default") | ||
|
|
||
| assert len(results) == 3 | ||
| assert results[0] is not None | ||
| assert results[0]["data"] == "current1" | ||
| assert results[1] is not None | ||
| assert results[1]["data"] == "legacy2" | ||
| assert results[2] is None | ||
|
|
||
| def test_get_many_with_migration( | ||
| self, migrating_wrapper: SanitizationMigrationWrapper, legacy_store: MemoryStore, current_store: MemoryStore | ||
| ) -> None: | ||
| """Test get_many migrates keys from legacy to current.""" | ||
| legacy_store.put(key="key1", value={"data": "legacy1"}, collection="default") | ||
| legacy_store.put(key="key2", value={"data": "legacy2"}, collection="default") | ||
|
|
||
| results = migrating_wrapper.get_many(keys=["key1", "key2"], collection="default") | ||
|
|
||
| assert len(results) == 2 | ||
| assert results[0] is not None | ||
| assert results[1] is not None | ||
|
|
||
| # Both should have been migrated | ||
| current_result1 = current_store.get(key="key1", collection="default") | ||
| current_result2 = current_store.get(key="key2", collection="default") | ||
| assert current_result1 is not None | ||
| assert current_result2 is not None | ||
|
|
||
| def test_ttl_from_current(self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore) -> None: | ||
| """Test ttl from current store.""" | ||
| current_store.put(key="test_key", value={"data": "current"}, collection="default", ttl=3600) | ||
|
|
||
| (value, ttl) = wrapper.ttl(key="test_key", collection="default") | ||
| assert value is not None | ||
| assert value["data"] == "current" | ||
| assert ttl is not None | ||
| assert ttl > 0 | ||
|
|
||
| def test_ttl_from_legacy(self, wrapper: SanitizationMigrationWrapper, legacy_store: MemoryStore, current_store: MemoryStore) -> None: | ||
| """Test ttl from legacy store.""" | ||
| legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default", ttl=3600) | ||
|
|
||
| (value, ttl) = wrapper.ttl(key="test_key", collection="default") | ||
| assert value is not None | ||
| assert value["data"] == "legacy" | ||
| assert ttl is not None | ||
|
|
||
| # Should NOT have been migrated (migrate_on_read=False) | ||
| current_result = current_store.get(key="test_key", collection="default") | ||
| assert current_result is None | ||
|
|
||
| def test_ttl_with_migration( | ||
| self, migrating_wrapper: SanitizationMigrationWrapper, legacy_store: MemoryStore, current_store: MemoryStore | ||
| ) -> None: | ||
| """Test ttl migrates from legacy to current.""" | ||
| legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default", ttl=3600) | ||
|
|
||
| (value, ttl) = migrating_wrapper.ttl(key="test_key", collection="default") | ||
| assert value is not None | ||
| assert ttl is not None | ||
|
|
||
| # Should have been migrated with TTL preserved | ||
| (current_value, current_ttl) = current_store.ttl(key="test_key", collection="default") | ||
| assert current_value is not None | ||
| assert current_ttl is not None | ||
|
|
||
| def test_put_many(self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore) -> None: | ||
| """Test put_many writes to current store.""" | ||
| wrapper.put_many(keys=["key1", "key2"], values=[{"data": "value1"}, {"data": "value2"}], collection="default") | ||
|
|
||
| # Verify in current store | ||
| result1 = current_store.get(key="key1", collection="default") | ||
| result2 = current_store.get(key="key2", collection="default") | ||
| assert result1 is not None | ||
| assert result2 is not None | ||
|
|
||
| def test_delete_many(self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore, legacy_store: MemoryStore) -> None: | ||
| """Test delete_many removes from both stores.""" | ||
| current_store.put(key="key1", value={"data": "current"}, collection="default") | ||
| legacy_store.put(key="key2", value={"data": "legacy"}, collection="default") | ||
|
|
||
| count = wrapper.delete_many(keys=["key1", "key2"], collection="default") | ||
| assert count == 2 | ||
|
|
||
| # Verify deletion | ||
| assert current_store.get(key="key1", collection="default") is None | ||
| assert legacy_store.get(key="key2", collection="default") is None | ||
|
|
||
| def test_cache_disabled(self, current_store: MemoryStore, legacy_store: MemoryStore) -> None: | ||
| """Test wrapper with caching disabled.""" | ||
| # Disable cache | ||
| wrapper = SanitizationMigrationWrapper(current_store=current_store, legacy_store=legacy_store, cache_size=0) | ||
|
|
||
| current_store.put(key="test_key", value={"data": "current"}, collection="default") | ||
|
|
||
| result = wrapper.get(key="test_key", collection="default") | ||
| assert result is not None | ||
|
|
||
| # Cache should be disabled | ||
| assert not wrapper._cache_enabled # pyright: ignore[reportPrivateUsage] | ||
|
|
||
| def test_ttl_many(self, wrapper: SanitizationMigrationWrapper, current_store: MemoryStore, legacy_store: MemoryStore) -> None: | ||
| """Test ttl_many with keys in different stores.""" | ||
| current_store.put(key="key1", value={"data": "current1"}, collection="default", ttl=3600) | ||
| legacy_store.put(key="key2", value={"data": "legacy2"}, collection="default", ttl=7200) | ||
|
|
||
| results = wrapper.ttl_many(keys=["key1", "key2", "key3"], collection="default") | ||
|
|
||
| assert len(results) == 3 | ||
| assert results[0][0] is not None # key1 value | ||
| assert results[0][1] is not None # key1 ttl | ||
| assert results[1][0] is not None # key2 value | ||
| assert results[1][1] is not None # key2 ttl | ||
| assert results[2][0] is None # key3 missing | ||
| assert results[2][1] is None # key3 no ttl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Excellent comprehensive test coverage for SanitizationMigrationWrapper.
The test suite thoroughly covers all major functionality:
- Current vs. legacy store reads with proper fallback behavior
- Migration semantics (migrate_on_read, delete_after_migration)
- Cache behavior (hits, misses, updates on put/delete)
- Batch operations (get_many, put_many, delete_many, ttl_many)
- TTL handling and preservation during migration
- Cache-disabled mode
All test assertions are correct and the test logic is sound.
Since this is auto-generated (lines 1-3), the following optional improvements should be made to the original async test file:
-
More precise TTL assertions: Tests like
test_ttl_from_current(line 209) andtest_ttl_with_migration(line 237) only verifyttl > 0. Consider asserting the TTL is approximately the expected value (e.g.,assert 3500 < ttl < 3600). -
Cache eviction test: Add a test that fills the cache beyond
cache_size=100and verifies LRU eviction behavior works correctly. -
Collection edge cases: All tests use
collection="default". Add test coverage forcollection=Noneand verify wrapper behavior across different collection values.
Example test for TTL precision:
def test_ttl_preserves_value(self, migrating_wrapper: SanitizationMigrationWrapper, legacy_store: MemoryStore, current_store: MemoryStore) -> None:
"""Test that TTL is preserved during migration."""
legacy_store.put(key="test_key", value={"data": "legacy"}, collection="default", ttl=3600)
(value, ttl) = migrating_wrapper.ttl(key="test_key", collection="default")
assert value is not None
assert ttl is not None
# TTL should be close to 3600, allowing for execution time
assert 3590 < ttl <= 3600🤖 Prompt for AI Agents
In
key-value/key-value-sync/tests/code_gen/wrappers/sanitization_migration/test_wrapper.py
lines 1-287, the review suggests three optional improvements to the original
async test file: tighten TTL assertions to assert TTL is approximately the
expected value (e.g., assert 3500 < ttl < 3600 or 3590 < ttl <= 3600) instead of
just ttl > 0; add a cache eviction test that constructs a wrapper with a small
cache_size, performs >cache_size distinct puts/gets to ensure LRU eviction
occurs and assert evicted entries are no longer cached; and add tests exercising
collection edge cases (e.g., collection=None and multiple collection names) to
verify behavior across collections; update the original async test file (not the
generated file) to implement these tests and assertions, preserving existing
fixtures and patterns, and ensure any timing-sensitive TTL checks allow a small
tolerance for execution time.
|
@claude please fix tests and convert to MigrateStoreWrapper |
|
Claude finished @strawgate's task —— View job Tests Fixed and Converted to MigrationWrapper ✅SummarySuccessfully refactored What ChangedRenamed wrapper to be more generic:
Fixed test issues:
Updated documentation:
Regenerated sync code:
Architecture RationaleThe wrapper doesn't depend on sanitization strategies at all - it just manages migration between any two
This makes it useful for many migration scenarios beyond just sanitization strategies. Verification
Commit: |
The wrapper is already generic and doesn't depend on sanitization strategies. It manages dual-store migration with caching and lazy migration, making it useful for various migration scenarios: - Sanitization strategy changes - Encryption configuration changes - Compression setting changes - Backend store migrations Changes: - Renamed SanitizationMigrationWrapper to MigrationWrapper - Renamed sanitization_migration directory to migration - Updated all imports and references - Added pytest-asyncio markers to test files - Updated documentation with generic examples - Regenerated sync code Co-authored-by: William Easton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
key-value/key-value-aio/src/key_value/aio/wrappers/migration/__init__.py(1 hunks)key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py(1 hunks)key-value/key-value-aio/tests/wrappers/migration/__init__.py(1 hunks)key-value/key-value-aio/tests/wrappers/migration/test_wrapper.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/__init__.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/wrapper.py(1 hunks)key-value/key-value-sync/src/key_value/sync/wrappers/migration/__init__.py(1 hunks)key-value/key-value-sync/tests/code_gen/wrappers/migration/__init__.py(1 hunks)key-value/key-value-sync/tests/code_gen/wrappers/migration/test_wrapper.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
key-value/key-value-aio/tests/wrappers/migration/test_wrapper.py (1)
key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py (10)
MigrationWrapper(22-481)put(403-416)get(136-188)ttl(279-320)_cache_get(124-128)delete(442-459)get_many(191-276)put_many(419-439)delete_many(462-481)ttl_many(323-400)
key-value/key-value-sync/src/key_value/sync/wrappers/migration/__init__.py (1)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/wrapper.py (1)
MigrationWrapper(25-479)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/wrapper.py (2)
key-value/key-value-sync/src/key_value/sync/code_gen/protocols/key_value.py (1)
KeyValue(178-183)key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py (12)
MigrationWrapper(22-481)_get_cache_key(120-122)_cache_get(124-128)get(136-188)_cache_put(130-133)ttl(279-320)put(403-416)delete(442-459)get_many(191-276)ttl_many(323-400)put_many(419-439)delete_many(462-481)
key-value/key-value-aio/src/key_value/aio/wrappers/migration/__init__.py (2)
key-value/key-value-aio/tests/wrappers/migration/test_wrapper.py (1)
wrapper(25-32)key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py (1)
MigrationWrapper(22-481)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/__init__.py (1)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/wrapper.py (1)
MigrationWrapper(25-479)
key-value/key-value-sync/tests/code_gen/wrappers/migration/test_wrapper.py (1)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/wrapper.py (10)
MigrationWrapper(25-479)put(406-419)get(139-191)ttl(282-323)_cache_get(127-131)delete(440-457)get_many(194-279)put_many(422-437)delete_many(460-479)ttl_many(326-403)
key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py (2)
key-value/key-value-aio/src/key_value/aio/protocols/key_value.py (1)
AsyncKeyValue(185-190)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/migration/wrapper.py (12)
MigrationWrapper(25-479)_get_cache_key(123-125)_cache_get(127-131)get(139-191)_cache_put(133-136)ttl(282-323)put(406-419)delete(440-457)get_many(194-279)ttl_many(326-403)put_many(422-437)delete_many(460-479)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
🔇 Additional comments (8)
key-value/key-value-sync/src/key_value/sync/wrappers/migration/__init__.py (2)
1-8: LGTM! Clean auto-generated re-export module.The auto-generation header is clear and appropriate. The module docstring accurately describes the migration wrapper's purpose and aligns with the PR objectives of enabling gradual store transitions without breaking data access.
10-12: Import and re-export are correct.The import path correctly references the code-generated
MigrationWrapperclass, and the__all__declaration properly exposes it as the public API. This successfully completes the conversion fromSanitizationMigrationWrapperto the genericMigrationWrapperpattern requested in the PR objectives.key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py (4)
1-20: LGTM!Module docstring clearly describes the wrapper's functionality, and imports are appropriate for the async implementation.
120-133: LGTM!Cache helper methods correctly handle the enabled/disabled state and use appropriate cache key format.
135-320: LGTM!The
getandttlmethods correctly implement dual-store reads with cache-aware fallback and optional lazy migration. TTL propagation during migration is handled properly.
402-481: LGTM!Write operations correctly target the current store, and delete operations properly remove keys from both stores to ensure complete removal. The dual-store delete approach trades performance for correctness, which is the right choice for a migration wrapper.
key-value/key-value-aio/src/key_value/aio/wrappers/migration/__init__.py (1)
1-9: LGTM!Package initializer correctly re-exports
MigrationWrapperwith an appropriate module docstring.key-value/key-value-aio/tests/wrappers/migration/test_wrapper.py (1)
1-302: LGTM!Comprehensive test suite covering all major MigrationWrapper functionality: dual-store reads, lazy migration, cache behavior, TTL handling, batch operations, and dual-store deletions. The use of private attributes with suppression comments is acceptable for testing internal behavior.
| def __init__( | ||
| self, | ||
| current_store: AsyncKeyValue, | ||
| legacy_store: AsyncKeyValue, | ||
| migrate_on_read: bool = False, | ||
| delete_after_migration: bool = False, | ||
| cache_size: int = 1000, | ||
| ) -> None: | ||
| """Initialize the migration wrapper. | ||
| Args: | ||
| current_store: The target store (new configuration). | ||
| legacy_store: The source store (old configuration). | ||
| migrate_on_read: If True, copy keys from legacy to current on read. | ||
| delete_after_migration: If True, delete from legacy after migration. | ||
| cache_size: Maximum number of keys to cache. Set to 0 to disable. | ||
| """ | ||
| self.current_store = current_store | ||
| self.legacy_store = legacy_store | ||
| self.migrate_on_read = migrate_on_read | ||
| self.delete_after_migration = delete_after_migration | ||
|
|
||
| # LRU cache to track which keys are in current store | ||
| # Maps (collection, key) -> "current" | "legacy" | "missing" | ||
| self._cache: LRUCache[tuple[str | None, str], str] = LRUCache(maxsize=cache_size) if cache_size > 0 else LRUCache(maxsize=1) | ||
| self._cache_enabled = cache_size > 0 | ||
|
|
||
| super().__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider avoiding cache allocation when disabled.
When cache_size=0, the code creates an LRUCache(maxsize=1) that is never used (since _cache_enabled=False). While harmless, this allocates an unnecessary object.
Consider this alternative:
- self._cache: LRUCache[tuple[str | None, str], str] = LRUCache(maxsize=cache_size) if cache_size > 0 else LRUCache(maxsize=1)
+ self._cache: LRUCache[tuple[str | None, str], str] | None = LRUCache(maxsize=cache_size) if cache_size > 0 else None
self._cache_enabled = cache_size > 0Then update _cache_get and _cache_put to handle None:
def _cache_get(self, key: str, collection: str | None) -> str | None:
"""Get cached location for a key."""
- if not self._cache_enabled:
+ if not self._cache_enabled or self._cache is None:
return None
return self._cache.get(self._get_cache_key(key, collection))
def _cache_put(self, key: str, collection: str | None, location: str) -> None:
"""Cache the location of a key."""
- if self._cache_enabled:
+ if self._cache_enabled and self._cache is not None:
self._cache[self._get_cache_key(key, collection)] = locationCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py
around lines 91 to 118, avoid allocating an unused LRUCache when cache_size==0
by setting self._cache to None (use Optional type) instead of creating
LRUCache(maxsize=1) and keep self._cache_enabled = cache_size > 0; update the
helper methods _cache_get and _cache_put to check for self._cache being None (or
for _cache_enabled) and do nothing / return None when caching is disabled so no
unnecessary object is created or accessed.
| # Optionally migrate | ||
| if self.migrate_on_read and legacy_value is not None: | ||
| _, ttl = await self.legacy_store.ttl(key=key, collection=collection) | ||
| await self.current_store.put(key=key, value=legacy_value, collection=collection, ttl=ttl) | ||
| if self.delete_after_migration: | ||
| await self.legacy_store.delete(key=key, collection=collection) | ||
| self._cache_put(key, collection, "current") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Optimize migration in get_many to batch TTL lookups.
When migrating legacy keys, the code performs individual ttl() calls for each key (line 236). This is inefficient and can cause a performance bottleneck when migrating many keys.
Consider batching TTL lookups using ttl_many():
# Fetch known legacy keys
if legacy_keys:
- legacy_results = await self.legacy_store.get_many(keys=legacy_keys, collection=collection)
+ if self.migrate_on_read:
+ # Fetch with TTL for migration
+ legacy_results_with_ttl = await self.legacy_store.ttl_many(keys=legacy_keys, collection=collection)
+ legacy_results = [result[0] for result in legacy_results_with_ttl]
+ else:
+ legacy_results = await self.legacy_store.get_many(keys=legacy_keys, collection=collection)
+
for i, key in enumerate(legacy_keys):
legacy_value = legacy_results[i]
key_to_value[key] = legacy_value
# Optionally migrate
if self.migrate_on_read and legacy_value is not None:
- _, ttl = await self.legacy_store.ttl(key=key, collection=collection)
+ ttl = legacy_results_with_ttl[i][1]
await self.current_store.put(key=key, value=legacy_value, collection=collection, ttl=ttl)
if self.delete_after_migration:
await self.legacy_store.delete(key=key, collection=collection)
self._cache_put(key, collection, "current")Apply the same pattern at lines 263-270 for unknown keys that need migration.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py
around lines 234-240 and 263-270, the migration does per-key await
self.legacy_store.ttl(...) which is inefficient; instead collect the set/list of
legacy keys to migrate and call await self.legacy_store.ttl_many(keys=keys,
collection=collection) once, then map returned ttls back to each key and pass
the corresponding ttl when calling current_store.put; preserve the existing
delete_after_migration logic and cache update for each key, and ensure ttl
fallback to None if ttl_many returns missing entries.
| @@ -0,0 +1 @@ | |||
| """Tests for sanitization migration wrapper.""" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update docstring to reflect generic migration wrapper.
The docstring refers to "sanitization migration wrapper," but the actual implementation is a generic MigrationWrapper that works with any store configuration changes (sanitization, encryption, compression, etc.), not just sanitization.
Apply this diff:
-"""Tests for sanitization migration wrapper."""
+"""Tests for migration wrapper."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """Tests for sanitization migration wrapper.""" | |
| """Tests for migration wrapper.""" |
🤖 Prompt for AI Agents
In key-value/key-value-aio/tests/wrappers/migration/__init__.py around line 1,
the module docstring currently names this as a "sanitization migration wrapper"
but the tested code implements a generic MigrationWrapper for any store
configuration changes; update the docstring to describe it as the generic
migration wrapper used to test store configuration migrations (sanitization,
encryption, compression, etc.), i.e., change the single-line docstring to a
brief accurate description reflecting that it tests the generic MigrationWrapper
for various store migrations.
| if legacy_keys: | ||
| legacy_results = self.legacy_store.get_many(keys=legacy_keys, collection=collection) | ||
| for i, key in enumerate(legacy_keys): | ||
| legacy_value = legacy_results[i] | ||
| key_to_value[key] = legacy_value | ||
|
|
||
| # Optionally migrate | ||
| if self.migrate_on_read and legacy_value is not None: | ||
| (_, ttl) = self.legacy_store.ttl(key=key, collection=collection) | ||
| self.current_store.put(key=key, value=legacy_value, collection=collection, ttl=ttl) | ||
| if self.delete_after_migration: | ||
| self.legacy_store.delete(key=key, collection=collection) | ||
| self._cache_put(key, collection, "current") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix stale cache entries for legacy misses.
When a key previously cached as legacy now returns None from the legacy store, we leave its cache entry at "legacy". Subsequent reads will keep bypassing the current store and hammer the legacy store even though the key no longer exists. Please mark the cache as "missing" in that branch so we actually benefit from the LRU optimization. Remember to patch the async source (key_value/key-value-aio/src/key_value/aio/wrappers/migration/wrapper.py) and regenerate this sync artifact.
Apply this diff in the source file:
if legacy_keys:
legacy_results = self.legacy_store.get_many(keys=legacy_keys, collection=collection)
for i, key in enumerate(legacy_keys):
legacy_value = legacy_results[i]
key_to_value[key] = legacy_value
- # Optionally migrate
- if self.migrate_on_read and legacy_value is not None:
+ # Optionally migrate
+ if self.migrate_on_read and legacy_value is not None:
(_, ttl) = self.legacy_store.ttl(key=key, collection=collection)
self.current_store.put(key=key, value=legacy_value, collection=collection, ttl=ttl)
if self.delete_after_migration:
self.legacy_store.delete(key=key, collection=collection)
self._cache_put(key, collection, "current")
+ elif legacy_value is None:
+ self._cache_put(key, collection, "missing")
else:
self._cache_put(key, collection, "legacy")| if legacy_keys: | ||
| legacy_results = self.legacy_store.ttl_many(keys=legacy_keys, collection=collection) | ||
| for i, key in enumerate(legacy_keys): | ||
| key_to_value[key] = legacy_results[i] | ||
|
|
||
| if self.migrate_on_read and legacy_results[i][0] is not None: | ||
| (value, ttl) = legacy_results[i] | ||
| self.current_store.put(key=key, value=value, collection=collection, ttl=ttl) # type: ignore | ||
| if self.delete_after_migration: | ||
| self.legacy_store.delete(key=key, collection=collection) | ||
| self._cache_put(key, collection, "current") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mirror the cache fix in ttl_many.
ttl_many() has the same stale-cache problem: if a cached-legacy key comes back empty, we keep its cache slot at "legacy", forcing redundant legacy lookups later. Align this with get_many() by updating the cache to "missing" when legacy_results[i][0] is None. Patch the async source before regenerating.
Apply this diff in the source file:
if legacy_keys:
legacy_results = self.legacy_store.ttl_many(keys=legacy_keys, collection=collection)
for i, key in enumerate(legacy_keys):
key_to_value[key] = legacy_results[i]
- if self.migrate_on_read and legacy_results[i][0] is not None:
+ if self.migrate_on_read and legacy_results[i][0] is not None:
(value, ttl) = legacy_results[i]
self.current_store.put(key=key, value=value, collection=collection, ttl=ttl) # type: ignore
if self.delete_after_migration:
self.legacy_store.delete(key=key, collection=collection)
self._cache_put(key, collection, "current")
+ elif legacy_results[i][0] is None:
+ self._cache_put(key, collection, "missing")
else:
self._cache_put(key, collection, "legacy")Committable suggestion skipped: line range outside the PR's diff.
key-value/key-value-sync/tests/code_gen/wrappers/migration/test_wrapper.py
Outdated
Show resolved
Hide resolved
|
@claude let's remove the migration wrapper entirely for now |
|
Claude finished @strawgate's task —— View job Migration Wrapper Removed ✅Successfully removed the MigrationWrapper implementation from this PR. The PR now focuses solely on the SanitizationStrategy pattern. Tasks Completed
SummaryFiles Removed (9 files, 1,579 lines deleted):
Files Retained (Sanitization Strategy):
Verification:
The PR now contains only the SanitizationStrategy pattern implementation with its three concrete strategies:
--- | |
Remove MigrationWrapper implementation to simplify PR scope. This PR now focuses solely on the SanitizationStrategy pattern. Co-authored-by: William Easton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
key-value/key-value-aio/src/key_value/aio/stores/memcached/store.py (1)
57-61: Remove legacy sanitization method.The store wires up
HashExcessLengthStrategyto the base class (lines 50-55), but this legacysanitize_keymethod is still being used throughout the file (lines 65, 81, 105, 125). This bypasses the new sanitization framework entirely.Apply this diff to remove the legacy method and use the base class sanitization:
- def sanitize_key(self, key: str) -> str: - if len(key) > MAX_KEY_LENGTH: - sha256_hash: str = hashlib.sha256(key.encode()).hexdigest() - return sha256_hash[:64] - return key - @override async def _get_managed_entry(self, *, key: str, collection: str) -> ManagedEntry | None: - combo_key: str = self.sanitize_key(compound_key(collection=collection, key=key)) + combo_key: str = self._sanitize_key(key=compound_key(collection=collection, key=key))Apply similar changes to lines 81, 105, and 125, replacing
self.sanitize_key(...)withself._sanitize_key(key=...).key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
197-216: Use the sanitized collection name everywhere.All downstream paths index
_collections_by_name[sanitized_collection], yet_setup_collectioncaches the handle under the rawcollectionstring and even callscreate_collection(name=collection). For an input like"foo::bar"the sanitizer yields"S_foo__bar-12345678", so the next read immediately throwsKeyError, and Mongo still rejectscreate_collection(name="foo::bar"). Please use the sanitized value when checking existence, creating the collection, and storing the handle.- sanitized_collection = self._sanitize_collection(collection=collection) - - collection_filter: dict[str, str] = {"name": collection} + sanitized_collection = self._sanitize_collection(collection=collection) + + collection_filter: dict[str, str] = {"name": sanitized_collection} matching_collections: list[str] = await self._db.list_collection_names(filter=collection_filter) if matching_collections: - self._collections_by_name[collection] = self._db[sanitized_collection] + self._collections_by_name[sanitized_collection] = self._db[sanitized_collection] return - new_collection: AsyncCollection[dict[str, Any]] = await self._db.create_collection(name=collection) + new_collection: AsyncCollection[dict[str, Any]] = await self._db.create_collection(name=sanitized_collection) ... - self._collections_by_name[collection] = new_collection + self._collections_by_name[sanitized_collection] = new_collectionkey-value/key-value-sync/src/key_value/sync/code_gen/stores/windows_registry/store.py (1)
33-72: Pass a regex toHybridSanitizationStrategy.
ALLOWED_KEY_COLLECTION_CHARACTERSis a raw character list. Because the strategy trimsallowed_characters[1:-1], you lose the boundary characters ("A"and"_"), so ordinary keys/collection names are rewritten and prefixed withS_, breaking existing registry paths. Provide a bracketed regex (e.g.r"[A-Za-z0-9_]") or rely on the default pattern.- sanitization_strategy = HybridSanitizationStrategy( - max_length=MAX_KEY_COLLECTION_LENGTH, allowed_characters=ALLOWED_KEY_COLLECTION_CHARACTERS - ) + sanitization_strategy = HybridSanitizationStrategy( + max_length=MAX_KEY_COLLECTION_LENGTH, + allowed_characters=r"[A-Za-z0-9_]", + )key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (1)
204-224: Key_collections_by_nameby the sanitized string and create with it.Downstream code always uses
self._collections_by_name[sanitized_collection], yet_setup_collectioncaches under the raw name and still invokescreate_collection(name=collection). For"foo::bar"the sanitizer emits"S_foo__bar-12345678", so later we look up_collections_by_name["S_foo__bar-12345678"]and crash, and Mongo still rejectscreate_collection("foo::bar"). Please switch the existence check, cache key, and creation call to the sanitized value.- sanitized_collection = self._sanitize_collection(collection=collection) - - collection_filter: dict[str, str] = {"name": collection} + sanitized_collection = self._sanitize_collection(collection=collection) + + collection_filter: dict[str, str] = {"name": sanitized_collection} matching_collections: list[str] = self._db.list_collection_names(filter=collection_filter) if matching_collections: - self._collections_by_name[collection] = self._db[sanitized_collection] + self._collections_by_name[sanitized_collection] = self._db[sanitized_collection] return - new_collection: Collection[dict[str, Any]] = self._db.create_collection(name=collection) + new_collection: Collection[dict[str, Any]] = self._db.create_collection(name=sanitized_collection) ... - self._collections_by_name[collection] = new_collection + self._collections_by_name[sanitized_collection] = new_collection
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (28)
key-value/key-value-aio/src/key_value/aio/stores/base.py(5 hunks)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(12 hunks)key-value/key-value-aio/src/key_value/aio/stores/keyring/store.py(5 hunks)key-value/key-value-aio/src/key_value/aio/stores/memcached/store.py(2 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(10 hunks)key-value/key-value-aio/src/key_value/aio/stores/simple/store.py(0 hunks)key-value/key-value-aio/src/key_value/aio/stores/windows_registry/store.py(3 hunks)key-value/key-value-aio/src/key_value/aio/wrappers/logging/wrapper.py(1 hunks)key-value/key-value-aio/src/key_value/aio/wrappers/prefix_collections/wrapper.py(1 hunks)key-value/key-value-aio/src/key_value/aio/wrappers/single_collection/wrapper.py(1 hunks)key-value/key-value-aio/src/key_value/aio/wrappers/statistics/wrapper.py(1 hunks)key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py(4 hunks)key-value/key-value-aio/tests/stores/mongodb/test_mongodb.py(4 hunks)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py(1 hunks)key-value/key-value-shared/tests/utils/test_sanitization.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py(5 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py(12 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/keyring/store.py(5 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py(10 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/simple/store.py(0 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/windows_registry/store.py(3 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/logging/wrapper.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/prefix_collections/wrapper.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/single_collection/wrapper.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/statistics/wrapper.py(1 hunks)key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py(4 hunks)key-value/key-value-sync/tests/code_gen/stores/mongodb/test_mongodb.py(4 hunks)pyproject.toml(1 hunks)
💤 Files with no reviewable changes (2)
- key-value/key-value-sync/src/key_value/sync/code_gen/stores/simple/store.py
- key-value/key-value-aio/src/key_value/aio/stores/simple/store.py
🧰 Additional context used
🧬 Code graph analysis (17)
key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py (1)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (4)
sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)
key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py (1)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (4)
sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
_sanitize_collection(132-133)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (1)
_sanitize_collection(138-139)
key-value/key-value-aio/src/key_value/aio/stores/memcached/store.py (1)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (1)
HashExcessLengthStrategy(99-147)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (4)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-shared/src/key_value/shared/utils/serialization.py (3)
SerializationAdapter(38-139)load_dict(67-106)dump_dict(114-132)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (1)
ElasticsearchSerializationAdapter(69-112)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/utils.py (1)
new_bulk_action(119-120)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-shared/src/key_value/shared/utils/serialization.py (3)
SerializationAdapter(38-139)load_dict(67-106)dump_dict(114-132)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
ElasticsearchSerializationAdapter(89-132)_get_index_name(239-240)_get_document_id(242-243)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
PassthroughStrategy(80-96)SanitizationStrategy(32-77)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
_sanitize_collection_and_key(129-130)_sanitize_collection(132-133)_sanitize_key(135-136)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (1)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
_sanitize_collection(132-133)
key-value/key-value-aio/tests/stores/mongodb/test_mongodb.py (1)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
_sanitize_collection(132-133)
key-value/key-value-aio/src/key_value/aio/stores/windows_registry/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
_sanitize_collection(132-133)
key-value/key-value-sync/tests/code_gen/stores/mongodb/test_mongodb.py (2)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
_sanitize_collection(132-133)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (1)
_sanitize_collection(138-139)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/keyring/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (1)
_sanitize_collection(138-139)
key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
PassthroughStrategy(80-96)SanitizationStrategy(32-77)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (3)
_sanitize_collection_and_key(135-136)_sanitize_collection(138-139)_sanitize_key(141-142)
key-value/key-value-shared/tests/utils/test_sanitization.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (14)
HashExcessLengthStrategy(99-147)HashFragmentMode(19-29)HybridSanitizationStrategy(150-260)PassthroughStrategy(80-96)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)validate(55-63)validate(91-92)validate(136-147)validate(249-260)try_unsanitize(65-77)try_unsanitize(94-96)
key-value/key-value-aio/src/key_value/aio/stores/keyring/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
_sanitize_collection(132-133)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/windows_registry/store.py (4)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
HybridSanitizationStrategy(150-260)sanitize(44-52)sanitize(87-89)sanitize(118-134)sanitize(189-247)key-value/key-value-aio/src/key_value/aio/stores/windows_registry/store.py (1)
_get_registry_path(65-68)key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
_sanitize_collection(132-133)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (1)
_sanitize_collection(138-139)
🪛 Gitleaks (8.28.0)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/keyring/store.py
[high] 49-49: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
key-value/key-value-aio/src/key_value/aio/stores/keyring/store.py
[high] 51-51: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
🔇 Additional comments (16)
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/logging/wrapper.py (1)
9-9: LGTM: Import centralization improves maintainability.The import source update for
DEFAULT_COLLECTION_NAMEto the shared constants module is a good refactoring practice. This change, along with similar updates across all wrappers, centralizes constant definitions and improves maintainability.key-value/key-value-aio/src/key_value/aio/wrappers/single_collection/wrapper.py (1)
4-4: LGTM: Consistent with centralization effort.Import source correctly updated to use the shared constants module.
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/single_collection/wrapper.py (1)
7-7: LGTM: Import centralization.Consistent with the broader refactoring to centralize shared constants.
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/statistics/wrapper.py (1)
8-8: LGTM: Proper constant centralization.Import correctly updated to the shared constants module.
key-value/key-value-aio/src/key_value/aio/wrappers/logging/wrapper.py (1)
6-6: LGTM: Centralized constant usage.Import source correctly updated to align with the shared constants module.
key-value/key-value-sync/src/key_value/sync/code_gen/wrappers/prefix_collections/wrapper.py (1)
7-7: LGTM: Import path updated correctly.Consistent with the centralization of shared constants across the codebase.
key-value/key-value-aio/src/key_value/aio/wrappers/prefix_collections/wrapper.py (1)
4-4: LGTM: Consistent import centralization.Import correctly updated to the shared constants module, aligning with other wrappers.
key-value/key-value-aio/src/key_value/aio/wrappers/statistics/wrapper.py (1)
5-5: LGTM: Import centralization complete.Import source correctly updated to the shared constants module, completing the consistent refactoring across all wrapper files.
key-value/key-value-aio/src/key_value/aio/stores/windows_registry/store.py (2)
7-8: LGTM! Imports are correct.The imports for
HybridSanitizationStrategyandALPHANUMERIC_CHARACTERSare properly used in the sanitization strategy configuration.
65-68: LGTM! Consistent use of shared sanitization.The transition from
_sanitize_collection_nameto_sanitize_collectioncorrectly aligns with the new shared sanitization strategy pattern. The method now delegates to the base class's_sanitize_collection, which applies the configuredHybridSanitizationStrategy.key-value/key-value-sync/src/key_value/sync/code_gen/stores/keyring/store.py (1)
48-56: LGTM!The sanitization strategy is correctly initialized and wired into the base class. The store properly delegates key and collection sanitization to the base class methods.
key-value/key-value-aio/src/key_value/aio/stores/keyring/store.py (1)
50-58: LGTM!The async keyring store correctly implements the sanitization strategy pattern, matching the sync version. Strategy initialization and base class integration are properly done.
key-value/key-value-aio/tests/stores/mongodb/test_mongodb.py (1)
149-149: LGTM!Test correctly updated to use the new
_sanitize_collectionmethod from the base class, replacing the old_sanitize_collection_name. The changes are consistent across all test cases.Also applies to: 165-165, 198-198, 214-214
key-value/key-value-sync/tests/code_gen/stores/mongodb/test_mongodb.py (1)
150-150: LGTM!The sync version correctly mirrors the aio test updates, using the new
_sanitize_collectionmethod consistently.Also applies to: 166-166, 194-194, 210-210
key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (1)
85-142: LGTM!The sanitization strategy integration is well-implemented in the sync base store. The default
PassthroughStrategymaintains backward compatibility, and the helper methods provide a clean API for subclasses to use.key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py (1)
159-160: LGTM!Tests correctly updated to use the strategy objects directly for sanitization. This allows proper verification of the sanitized index names and document IDs used in Elasticsearch storage.
Also applies to: 180-181, 205-206, 226-227
| def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]: | ||
| return self._collection_sanitization_strategy.sanitize(value=collection), self._key_sanitization_strategy.sanitize(value=key) | ||
|
|
||
| def _sanitize_collection(self, collection: str) -> str: | ||
| return self._collection_sanitization_strategy.sanitize(value=collection) | ||
|
|
||
| def _sanitize_key(self, key: str) -> str: | ||
| return self._key_sanitization_strategy.sanitize(value=key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforce reserved-prefix validation before sanitizing
SanitizationStrategy.validate never runs here, so callers can pass keys like H_deadbeef/S_existing that collide with hashed outputs, undoing the reserved-prefix safety the new strategies provide. Please validate before sanitizing to stop those collisions.
def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]:
- return self._collection_sanitization_strategy.sanitize(value=collection), self._key_sanitization_strategy.sanitize(value=key)
+ return self._sanitize_collection(collection), self._sanitize_key(key)
def _sanitize_collection(self, collection: str) -> str:
- return self._collection_sanitization_strategy.sanitize(value=collection)
+ self._collection_sanitization_strategy.validate(value=collection)
+ return self._collection_sanitization_strategy.sanitize(value=collection)
def _sanitize_key(self, key: str) -> str:
- return self._key_sanitization_strategy.sanitize(value=key)
+ self._key_sanitization_strategy.validate(value=key)
+ return self._key_sanitization_strategy.sanitize(value=key)🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/base.py around lines 129 to
136, the sanitization helpers call .sanitize(...) directly which skips the
sanitization strategy’s .validate(...) check and allows inputs with reserved
prefixes (e.g., "H_...", "S_...") to pass through; before calling .sanitize for
both collection and key, invoke the corresponding strategy.validate(value=...)
and raise or propagate its validation error when validation fails, then call
.sanitize and return the sanitized value(s) — apply this to
_sanitize_collection_and_key, _sanitize_collection, and _sanitize_key so
reserved-prefix collisions are prevented.
| super().__init__( | ||
| default_collection=default_collection, | ||
| collection_sanitization_strategy=HybridSanitizationStrategy( | ||
| replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a regex for allowed_characters.
HybridSanitizationStrategy slices allowed_characters[1:-1] before building the negated class. Feeding it COLLECTION_ALLOWED_CHARACTERS (a literal character set) drops the first and last symbols ("A" and "_") from the whitelist, so perfectly valid names suddenly get rewritten/prefixed with S_, breaking lookups for existing collections. Pass a bracketed regex (e.g. r"[a-zA-Z0-9_]") or omit the argument.
- collection_sanitization_strategy=HybridSanitizationStrategy(
- replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS
- ),
+ collection_sanitization_strategy=HybridSanitizationStrategy(
+ replacement_character="_",
+ max_length=MAX_COLLECTION_LENGTH,
+ allowed_characters=r"[a-zA-Z0-9_]",
+ ),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| super().__init__( | |
| default_collection=default_collection, | |
| collection_sanitization_strategy=HybridSanitizationStrategy( | |
| replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS | |
| ), | |
| ) | |
| super().__init__( | |
| default_collection=default_collection, | |
| collection_sanitization_strategy=HybridSanitizationStrategy( | |
| replacement_character="_", | |
| max_length=MAX_COLLECTION_LENGTH, | |
| allowed_characters=r"[a-zA-Z0-9_]", | |
| ), | |
| ) |
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py around
lines 178 to 183, the call to HybridSanitizationStrategy passes
COLLECTION_ALLOWED_CHARACTERS (a literal character set) but the strategy slices
allowed_characters[1:-1], dropping the first and last characters and corrupting
the whitelist; fix by passing a bracketed regex for allowed_characters (for
example r"[a-zA-Z0-9_]" ) or simply omit the allowed_characters argument so the
strategy uses its default behavior, ensuring valid collection names are not
truncated or altered.
| MAX_KEY_COLLECTION_LENGTH = 96 | ||
| ALLOWED_KEY_COLLECTION_CHARACTERS: str = ALPHANUMERIC_CHARACTERS | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Reconsider the 96-character limit.
MAX_KEY_COLLECTION_LENGTH = 96 appears arbitrarily restrictive. Windows Registry supports key names up to 255 characters. With the HybridSanitizationStrategy overhead ('S_' prefix + potential 9-char hash fragment), the effective space for original content drops to ~85 characters when sanitization occurs.
A shorter limit increases collision likelihood and forces more aggressive truncation/hashing. Consider raising this to at least 240 (the default in HybridSanitizationStrategy) to better utilize Windows Registry's capabilities and reduce unnecessary hash fragment additions.
Would you like me to verify the Windows Registry character limits or help adjust this constant?
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/windows_registry/store.py
around lines 24-26, MAX_KEY_COLLECTION_LENGTH is set to 96 which is too
restrictive for Windows Registry and conflicts with HybridSanitizationStrategy
overhead; change the constant to at least 240 (or 255) to match Windows limits
and the HybridSanitizationStrategy default, update any comments to cite the
Windows 255-char key name limit and the sanitization prefix/hash overhead, and
adjust/update any tests or code paths that assume 96 so they use the new
constant.
key-value/key-value-aio/src/key_value/aio/stores/windows_registry/store.py
Show resolved
Hide resolved
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py
Outdated
Show resolved
Hide resolved
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py
Show resolved
Hide resolved
| super().__init__( | ||
| default_collection=default_collection, | ||
| collection_sanitization_strategy=HybridSanitizationStrategy( | ||
| replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide a regex for allowed_characters.
HybridSanitizationStrategy assumes a bracketed regex and slices allowed_characters[1:-1]. The literal COLLECTION_ALLOWED_CHARACTERS string therefore drops its first/last characters ("A" and "_"), forcing legitimate collection names through the sanitizer and changing them. Use a regex pattern (e.g. r"[a-zA-Z0-9_]") instead.
- collection_sanitization_strategy=HybridSanitizationStrategy(
- replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS
- ),
+ collection_sanitization_strategy=HybridSanitizationStrategy(
+ replacement_character="_",
+ max_length=MAX_COLLECTION_LENGTH,
+ allowed_characters=r"[a-zA-Z0-9_]",
+ ),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| super().__init__( | |
| default_collection=default_collection, | |
| collection_sanitization_strategy=HybridSanitizationStrategy( | |
| replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS | |
| ), | |
| ) | |
| super().__init__( | |
| default_collection=default_collection, | |
| collection_sanitization_strategy=HybridSanitizationStrategy( | |
| replacement_character="_", | |
| max_length=MAX_COLLECTION_LENGTH, | |
| allowed_characters=r"[a-zA-Z0-9_]", | |
| ), | |
| ) |
| typeCheckingMode = "strict" | ||
| reportExplicitAny = false | ||
| reportMissingTypeStubs = false | ||
| reportPrivateUsage = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider using inline ignores instead of disabling globally.
Disabling reportPrivateUsage globally weakens type safety across the entire codebase. Private member access in tests could be handled with targeted inline comments like # pyright: ignore[reportPrivateUsage] at the specific call sites.
Apply this diff to revert the global setting:
-reportPrivateUsage = falseThen add inline ignores in test files where needed:
# In test files where private access is required:
sanitized_collection = store._sanitize_collection(collection="test") # pyright: ignore[reportPrivateUsage]🤖 Prompt for AI Agents
In pyproject.toml around line 107, the project currently sets reportPrivateUsage
= false which disables pyright private-member checks globally; revert that
setting to true (or remove the global override) and instead add targeted inline
ignores in test files at the specific private-member call sites that require it
(e.g., append a pyright: ignore[reportPrivateUsage] comment on the line invoking
the private member) so type checking remains strict everywhere except those
explicit test exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
129-130: CRITICAL: Validation bypass allows reserved-prefix collision.
_sanitize_collection_and_keycalls.sanitize()directly without first calling.validate(), allowing users to supply keys/collections starting withH_orS_that collide with sanitized outputs. This undermines the reserved-prefix protection that the strategies are designed to enforce.Apply this diff to enforce validation before sanitization:
def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]: - return self._collection_sanitization_strategy.sanitize(value=collection), self._key_sanitization_strategy.sanitize(value=key) + return self._sanitize_collection(collection), self._sanitize_key(key)This delegates to the individual methods that properly validate before sanitizing.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
key-value/key-value-aio/src/key_value/aio/stores/base.py(5 hunks)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py(1 hunks)key-value/key-value-shared/tests/utils/test_sanitization.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
key-value/key-value-shared/tests/utils/test_sanitization.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (14)
HashExcessLengthStrategy(100-148)HashFragmentMode(20-30)HybridSanitizationStrategy(151-262)PassthroughStrategy(81-97)sanitize(45-53)sanitize(88-90)sanitize(119-135)sanitize(190-249)validate(56-64)validate(92-93)validate(137-148)validate(251-262)try_unsanitize(66-78)try_unsanitize(95-97)
key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (10)
PassthroughStrategy(81-97)SanitizationStrategy(33-78)sanitize(45-53)sanitize(88-90)sanitize(119-135)sanitize(190-249)validate(56-64)validate(92-93)validate(137-148)validate(251-262)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (3)
_sanitize_collection_and_key(135-136)_sanitize_collection(138-140)_sanitize_key(142-144)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitize.py (1)
sanitize_characters_in_string(34-61)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (10)
PassthroughStrategy(81-97)SanitizationStrategy(33-78)sanitize(45-53)sanitize(88-90)sanitize(119-135)sanitize(190-249)validate(56-64)validate(92-93)validate(137-148)validate(251-262)key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
_sanitize_collection_and_key(129-130)_sanitize_collection(132-134)_sanitize_key(136-138)
🪛 Gitleaks (8.28.0)
key-value/key-value-shared/tests/utils/test_sanitization.py
[high] 258-258: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-shared)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
🔇 Additional comments (10)
key-value/key-value-shared/tests/utils/test_sanitization.py (1)
1-272: Comprehensive test coverage for sanitization strategies.The test suite is well-structured and covers all key scenarios including pass-through behavior, hash-based sanitization, hybrid character replacement, different hash fragment modes, validation of reserved prefixes, and edge cases. The use of snapshot testing for deterministic outputs is appropriate.
Note: The static analysis tool flagged line 258 as a potential API key, but this is a false positive—the string
"S_my_key-12345678"is test data demonstrating the reserved prefix format, not an actual secret.key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
17-17: LGTM!The import correctly brings in the sanitization strategy components needed for the base store.
83-84: LGTM!Constructor parameters are properly typed and documented. Defaulting to
Nonewith fallback toPassthroughStrategyin the initializer is a good pattern.Also applies to: 92-93
112-113: LGTM!Strategy fields are correctly initialized with the PassthroughStrategy fallback.
key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (1)
20-20: LGTM!The sanitization strategy integration follows the same correct pattern as the async version—proper imports, parameter typing, documentation, and initialization with PassthroughStrategy fallback.
Also applies to: 89-90, 98-99, 118-119
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (5)
1-79: Well-designed sanitization framework.The module structure is clean with clear documentation. The
SanitizationStrategyABC defines a focused interface withsanitize,validate, and optionaltry_unsanitize. TheHashFragmentModeenum provides clear configuration options for hash fragment behavior.
81-98: LGTM!
PassthroughStrategycorrectly implements a no-op path for stores without character or length restrictions. All three methods properly handle the pass-through semantics.
100-149: LGTM!
HashExcessLengthStrategycorrectly hashes keys exceedingmax_lengthusing SHA256 with theH_prefix. The truncation logic properly accounts for the prefix (max_length - 2), and validation correctly rejects reserved prefixes to prevent collisions.
251-262: LGTM!Validation correctly rejects both reserved prefixes
H_andS_to prevent collisions with sanitized keys.
207-209: Past review comment is outdated.The previous review flagged regex construction issues with
allowed_characters[1:-1]slicing. However, the current implementation usessanitize_characters_in_string(), which iterates through characters and checks membership withif char in allowed_characters. This approach doesn't use regex and works correctly with plain character strings—no slicing or escaping issues.
| def __init__( | ||
| self, | ||
| max_length: int = 240, | ||
| allowed_characters: str | None = None, | ||
| replacement_character: str = "_", | ||
| hash_fragment_mode: HashFragmentMode = HashFragmentMode.ONLY_IF_CHANGED, | ||
| hash_fragment_length: int = 8, | ||
| ) -> None: | ||
| """Initialize the character sanitization strategy. | ||
| Args: | ||
| max_length: Maximum length after sanitization. | ||
| allowed_characters: list of allowed characters. Defaults to None (all characters allowed). | ||
| replacement_character: Character to use for invalid characters. | ||
| hash_fragment_mode: When to add hash fragments. | ||
| hash_fragment_length: Length of hash fragment. | ||
| """ | ||
| self.max_length = max_length | ||
| self.allowed_characters: str | None = allowed_characters | ||
| self.replacement_character = replacement_character | ||
| self.hash_fragment_mode = hash_fragment_mode | ||
| self.hash_fragment_length = hash_fragment_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider validating minimum max_length.
The HybridSanitizationStrategy assumes max_length is large enough to accommodate the S_ prefix (at least 2 characters) and potentially the hash fragment overhead (at least 11 characters with defaults). Very small values like max_length=1 would cause unexpected behavior in the truncation logic (lines 244-246).
Consider adding validation or documentation:
def __init__(
self,
max_length: int = 240,
allowed_characters: str | None = None,
replacement_character: str = "_",
hash_fragment_mode: HashFragmentMode = HashFragmentMode.ONLY_IF_CHANGED,
hash_fragment_length: int = 8,
) -> None:
"""Initialize the character sanitization strategy.
Args:
max_length: Maximum length after sanitization.
+ Must be at least 2 to accommodate the 'S_' prefix when sanitization occurs.
allowed_characters: String of allowed characters. Defaults to None (all characters allowed).
replacement_character: Character to use for invalid characters.
hash_fragment_mode: When to add hash fragments.
hash_fragment_length: Length of hash fragment.
"""
+ if max_length < 2:
+ msg = "max_length must be at least 2 to accommodate the 'S_' prefix"
+ raise ValueError(msg)
self.max_length = max_lengthAlternatively, document that this strategy is not suitable for stores with extreme length restrictions (< 2 characters).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__( | |
| self, | |
| max_length: int = 240, | |
| allowed_characters: str | None = None, | |
| replacement_character: str = "_", | |
| hash_fragment_mode: HashFragmentMode = HashFragmentMode.ONLY_IF_CHANGED, | |
| hash_fragment_length: int = 8, | |
| ) -> None: | |
| """Initialize the character sanitization strategy. | |
| Args: | |
| max_length: Maximum length after sanitization. | |
| allowed_characters: list of allowed characters. Defaults to None (all characters allowed). | |
| replacement_character: Character to use for invalid characters. | |
| hash_fragment_mode: When to add hash fragments. | |
| hash_fragment_length: Length of hash fragment. | |
| """ | |
| self.max_length = max_length | |
| self.allowed_characters: str | None = allowed_characters | |
| self.replacement_character = replacement_character | |
| self.hash_fragment_mode = hash_fragment_mode | |
| self.hash_fragment_length = hash_fragment_length | |
| def __init__( | |
| self, | |
| max_length: int = 240, | |
| allowed_characters: str | None = None, | |
| replacement_character: str = "_", | |
| hash_fragment_mode: HashFragmentMode = HashFragmentMode.ONLY_IF_CHANGED, | |
| hash_fragment_length: int = 8, | |
| ) -> None: | |
| """Initialize the character sanitization strategy. | |
| Args: | |
| max_length: Maximum length after sanitization. | |
| Must be at least 2 to accommodate the 'S_' prefix when sanitization occurs. | |
| allowed_characters: String of allowed characters. Defaults to None (all characters allowed). | |
| replacement_character: Character to use for invalid characters. | |
| hash_fragment_mode: When to add hash fragments. | |
| hash_fragment_length: Length of hash fragment. | |
| """ | |
| if max_length < 2: | |
| msg = "max_length must be at least 2 to accommodate the 'S_' prefix" | |
| raise ValueError(msg) | |
| self.max_length = max_length | |
| self.allowed_characters: str | None = allowed_characters | |
| self.replacement_character = replacement_character | |
| self.hash_fragment_mode = hash_fragment_mode | |
| self.hash_fragment_length = hash_fragment_length |
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py
Outdated
Show resolved
Hide resolved
| def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]: | ||
| return (self._collection_sanitization_strategy.sanitize(value=collection), self._key_sanitization_strategy.sanitize(value=key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: Validation bypass allows reserved-prefix collision (code-gen file).
_sanitize_collection_and_key skips .validate() before calling .sanitize(), creating the same security vulnerability as in the async version. Users can supply keys/collections with reserved prefixes H_ or S_ that collide with sanitized keys.
Since this is auto-generated, the fix must be applied to the source file, then regenerated.
Expected fix pattern (to be applied to the source):
def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]:
- return (self._collection_sanitization_strategy.sanitize(value=collection), self._key_sanitization_strategy.sanitize(value=key))
+ return (self._sanitize_collection(collection), self._sanitize_key(key))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]: | |
| return (self._collection_sanitization_strategy.sanitize(value=collection), self._key_sanitization_strategy.sanitize(value=key)) | |
| def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]: | |
| return (self._sanitize_collection(collection), self._sanitize_key(key)) |
🤖 Prompt for AI Agents
In key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py around
lines 135 to 136, the _sanitize_collection_and_key method currently calls
sanitize() directly and bypasses validate(), allowing inputs with reserved
prefixes (e.g., "H_", "S_") to collide after sanitization; update the method to
first call each strategy's validate(value=...) and handle/raise on validation
failure, then call sanitize(value=...) for both collection and key, apply this
change in the generator source and regenerate the file so the fix is present in
the auto-generated output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
237-241: Validate before sanitizing to preserve reserved-prefix guarantees.By calling
sanitize()withoutvalidate(), user-supplied collections starting withH_/S_slide through unchanged, recreating the collision vector we introduced the prefixes to block. Please runvalidate()on the raw input before sanitizing (and keep the lowercase handling afterwards) so misuse is rejected consistently, just like in the sync store.def _get_index_name(self, collection: str) -> str: - return self._index_prefix + "-" + self._collection_sanitization.sanitize(value=collection).lower() + self._collection_sanitization.validate(value=collection) + sanitized = self._collection_sanitization.sanitize(value=collection) + return self._index_prefix + "-" + sanitized.lower() def _get_document_id(self, key: str) -> str: - return self._key_sanitization.sanitize(value=key) + self._key_sanitization.validate(value=key) + return self._key_sanitization.sanitize(value=key)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (1)
206-210: Run validate() before sanitize() for collections/keys.The sync store still skips
validate(), so reserved-prefix inputs (H_*/S_*) bypass the guard and collide with hashed/sanitized IDs. Mirror the async fix: validate first, then sanitize (keeping the lowercase conversion). This restores the integrity guarantee across both stores.def _get_index_name(self, collection: str) -> str: - return self._index_prefix + "-" + self._collection_sanitization.sanitize(value=collection).lower() + self._collection_sanitization.validate(value=collection) + sanitized = self._collection_sanitization.sanitize(value=collection) + return self._index_prefix + "-" + sanitized.lower() def _get_document_id(self, key: str) -> str: - return self._key_sanitization.sanitize(value=key) + self._key_sanitization.validate(value=key) + return self._key_sanitization.sanitize(value=key)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
key-value/key-value-aio/src/key_value/aio/stores/base.py(5 hunks)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(12 hunks)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py(1 hunks)key-value/key-value-shared/tests/utils/test_sanitization.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py(5 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py(12 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
key-value/key-value-shared/tests/utils/test_sanitization.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (18)
AlwaysHashStrategy(104-128)HashExcessLengthStrategy(131-179)HashFragmentMode(20-30)HybridSanitizationStrategy(182-293)PassthroughStrategy(81-97)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)validate(56-64)validate(92-93)validate(123-124)validate(168-179)validate(282-293)try_unsanitize(66-78)try_unsanitize(95-97)try_unsanitize(126-128)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (7)
AlwaysHashStrategy(104-128)HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (1)
ElasticsearchSerializationAdapter(69-112)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/utils.py (1)
new_bulk_action(119-120)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitize.py (1)
sanitize_characters_in_string(34-61)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (7)
AlwaysHashStrategy(104-128)HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-shared/src/key_value/shared/utils/serialization.py (3)
SerializationAdapter(38-139)load_dict(67-106)dump_dict(114-132)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
ElasticsearchSerializationAdapter(89-132)_get_index_name(237-238)_get_document_id(240-241)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (12)
PassthroughStrategy(81-97)SanitizationStrategy(33-78)validate(56-64)validate(92-93)validate(123-124)validate(168-179)validate(282-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
_sanitize_collection_and_key(129-130)_sanitize_collection(132-134)_sanitize_key(136-138)
key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (12)
PassthroughStrategy(81-97)SanitizationStrategy(33-78)validate(56-64)validate(92-93)validate(123-124)validate(168-179)validate(282-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (3)
_sanitize_collection_and_key(135-136)_sanitize_collection(138-140)_sanitize_key(142-144)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
🔇 Additional comments (1)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
129-138: LGTM — Reserved-prefix validation now enforced.The critical issue flagged in the previous review has been resolved. Both
_sanitize_collectionand_sanitize_keynow call.validate()before.sanitize(), preventing users from passing keys or collections with reserved prefixes (H_,S_) that would collide with sanitized outputs.
| """Initialize the collection (called once before first use of the collection).""" | ||
|
|
||
| def _sanitize_collection_and_key(self, collection: str, key: str) -> tuple[str, str]: | ||
| return self._sanitize_collection(collection=collection), self._sanitize_key(key=key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Prefer explicit tuple parentheses for consistency.
The tuple return on line 130 omits explicit parentheses, while the sync version (from key-value-sync/src/key_value/sync/code_gen/stores/base.py) uses explicit parentheses. For consistency across async and sync implementations, consider adding them.
Apply this diff:
- return self._sanitize_collection(collection=collection), self._sanitize_key(key=key)
+ return (self._sanitize_collection(collection=collection), self._sanitize_key(key=key))🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/base.py around line 130, the
return statement returns a tuple without explicit parentheses; change it to
return the sanitized collection and key wrapped in explicit parentheses (e.g.,
return (self._sanitize_collection(collection=collection),
self._sanitize_key(key=key))) to match the sync implementation and maintain
consistency across modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
130-131: Prefer explicit tuple parentheses for consistency.The tuple return on line 131 omits explicit parentheses, while the sync version uses
return (self._sanitize_collection(collection=collection), self._sanitize_key(key=key)). For consistency across async and sync implementations, consider adding them.Apply this diff:
- return self._sanitize_collection(collection=collection), self._sanitize_key(key=key) + return (self._sanitize_collection(collection=collection), self._sanitize_key(key=key))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
key-value/key-value-aio/src/key_value/aio/stores/base.py(5 hunks)key-value/key-value-shared/tests/utils/test_sanitization.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
key-value/key-value-shared/tests/utils/test_sanitization.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (18)
AlwaysHashStrategy(104-128)HashExcessLengthStrategy(131-179)HashFragmentMode(20-30)HybridSanitizationStrategy(182-293)PassthroughStrategy(81-97)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)validate(56-64)validate(92-93)validate(123-124)validate(168-179)validate(282-293)try_unsanitize(66-78)try_unsanitize(95-97)try_unsanitize(126-128)
key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (12)
PassthroughStrategy(81-97)SanitizationStrategy(33-78)validate(56-64)validate(92-93)validate(123-124)validate(168-179)validate(282-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (3)
_sanitize_collection_and_key(136-137)_sanitize_collection(139-141)_sanitize_key(143-145)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (12)
PassthroughStrategy(81-97)SanitizationStrategy(33-78)validate(56-64)validate(92-93)validate(123-124)validate(168-179)validate(282-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
_sanitize_collection_and_key(130-131)_sanitize_collection(133-135)_sanitize_key(137-139)
🪛 Gitleaks (8.28.0)
key-value/key-value-shared/tests/utils/test_sanitization.py
[high] 288-288: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
🔇 Additional comments (13)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (3)
20-20: LGTM! Sanitization strategy infrastructure correctly imported and declared.The import and field declarations for
SanitizationStrategyalign with the aio base implementation and establish the foundation for pluggable sanitization.Also applies to: 79-80
86-120: LGTM! Constructor properly wired with sanitization strategies.The
__init__signature correctly accepts optional sanitization strategies for keys and collections, with appropriate defaults toPassthroughStrategy()when not provided. This mirrors the aio implementation and maintains backward compatibility.
136-145: ✅ CRITICAL ISSUE RESOLVED: Validation now enforced before sanitization.The previous critical validation bypass has been fixed. The implementation now correctly:
- Calls
validate()beforesanitize()in both_sanitize_collection()(line 140) and_sanitize_key()(line 144)- Routes through helper methods in
_sanitize_collection_and_key()(line 137) instead of direct sanitization- Matches the async version exactly (aio/stores/base.py lines 129-138)
This prevents users from supplying keys or collections with reserved prefixes (e.g.,
H_,S_) that would collide with sanitized keys. The fix has been properly applied in the generator source and regenerated.key-value/key-value-aio/src/key_value/aio/stores/base.py (5)
17-17: LGTM!The import correctly brings in the sanitization strategy types needed for the new functionality.
73-74: Previous issue resolved!The field annotations now correctly declare both
_key_sanitization_strategyand_collection_sanitization_strategy, matching the attributes initialized in__init__. The previous incorrect_sanitization_strategyannotation has been fixed.
84-85: LGTM!The constructor parameters and documentation for the sanitization strategies are well-defined. The keyword-only parameters with
Nonedefaults and fallback toPassthroughStrategy()provide a clean opt-in API.Also applies to: 93-94
113-114: LGTM!The field initialization correctly defaults to
PassthroughStrategy()when no strategy is provided, maintaining backward compatibility while enabling opt-in sanitization.
133-139: Previous validation issue resolved!The sanitization helpers now correctly call
validate()beforesanitize()for both collections and keys (lines 134, 138). This enforces reserved-prefix validation and prevents collisions with sanitized keys.key-value/key-value-shared/tests/utils/test_sanitization.py (5)
17-39: LGTM! PassthroughStrategy tests are comprehensive.The tests correctly verify that PassthroughStrategy performs no sanitization, accepts any value including reserved prefixes, and supports reversible unsanitization.
41-68: LGTM! AlwaysHashStrategy tests properly validate boundaries.The tests correctly verify hash_length constraints (must be >8 and ≤64) and deterministic hashing behavior. The past review concern about hash_length=99 has been addressed.
70-138: LGTM! HashExcessLengthStrategy tests are thorough.The tests comprehensively cover short vs. long key handling, deterministic hashing, collision prevention via reserved prefix validation, and custom max_length configuration.
269-302: LGTM! Edge case tests are well-designed.The tests properly verify:
- Collision prevention through reserved prefix validation (H_ and S_)
- Empty string handling across all strategy types
Note: The static analysis tool flagged line 288 as a potential API key, but this is a false positive—it's just a test string demonstrating the reserved prefix format.
258-267: Snapshot is correct; review comment contains flawed analysis.The implementation uses
sanitize_characters_in_string(), which only replaces the first of multiple consecutive disallowed characters, removing the rest. For input "Test123" with LOWERCASE_ALPHABET allowed:
- 'T' (invalid, first) → replaced with '_'
- 'e', 's', 't' (valid) → kept as-is
- '1', '2', '3' (invalid, consecutive) → first replaced with '_', rest removed
This produces "est", which with the "S_" prefix yields "S__est_" — exactly matching the snapshot. The review comment's manual calculation incorrectly assumed each invalid character becomes an underscore.
| def test_sanitize_deterministic(self) -> None: | ||
| """Test that sanitization is deterministic.""" | ||
| strategy = HybridSanitizationStrategy() | ||
| result1 = strategy.sanitize("test::key") | ||
| result2 = strategy.sanitize("test::key") | ||
|
|
||
| assert result1 == result2 | ||
|
|
||
| def test_sanitize_different_values_different_hashes(self) -> None: | ||
| """Test that different values produce different hashes.""" | ||
| strategy = HybridSanitizationStrategy() | ||
| result1 = strategy.sanitize("test::key1") | ||
| result2 = strategy.sanitize("test::key2") | ||
|
|
||
| assert result1 != result2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Clarify test intent: tests don't actually trigger sanitization.
Both test_sanitize_deterministic and test_sanitize_different_values_different_hashes instantiate HybridSanitizationStrategy() with no allowed_characters specified, which defaults to None (no character restrictions). This means "test::key" is returned unchanged, so these tests verify determinism and uniqueness of pass-through behavior rather than sanitization behavior.
Consider either:
- Adding
allowed_characters=ALPHANUMERIC_CHARACTERSto actually test sanitization - Renaming tests to clarify they're testing the no-restrictions case
- Adding assertions that verify no sanitization occurred (e.g.,
assert "::" in result)
🤖 Prompt for AI Agents
In key-value/key-value-shared/tests/utils/test_sanitization.py around lines 168
to 182, the two tests instantiate HybridSanitizationStrategy() with
allowed_characters=None so they never exercise sanitization; update them to
either pass allowed_characters=ALPHANUMERIC_CHARACTERS to force sanitization and
assert sanitized output differs or rename the tests to indicate they verify
pass-through behavior and add explicit assertions that no sanitization occurred
(e.g., assert delimiter still present) so the test intent matches the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py(4 hunks)key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py (1)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
_get_index_name(237-238)_get_document_id(240-241)
key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
_get_index_name(237-238)_get_document_id(240-241)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (2)
_get_index_name(206-207)_get_document_id(209-210)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
🔇 Additional comments (4)
key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py (1)
166-167: LGTM!The method name updates correctly align with the refactored ElasticsearchStore API. The tests appropriately access private methods to verify raw Elasticsearch document structure, and the pyright ignore comments properly acknowledge this pattern.
Also applies to: 223-224, 247-248
key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py (3)
159-160: LGTM: Method name updates align with new sanitization architecture.The updated method names (
_get_index_name,_get_document_id) better reflect their role as accessors now that sanitization is delegated to strategies. The private access is appropriately justified for white-box testing of storage format.
205-206: LGTM: Consistent method updates.Correct usage of the new accessor methods with appropriate pyright ignore comments.
226-227: LGTM: Proper private API handling.Correct method calls with appropriate pyright ignore comments for private API access.
| index_name = store._get_index_name(collection="test") | ||
| doc_id = store._get_document_id(key="legacy_key") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Add pyright ignore comments for consistency.
These private method calls are missing the # pyright: ignore[reportPrivateUsage] comments that are present on lines 159-160, 205-206, and 226-227. Add them for consistency and to suppress type checker warnings.
Apply this diff:
- index_name = store._get_index_name(collection="test")
- doc_id = store._get_document_id(key="legacy_key")
+ index_name = store._get_index_name(collection="test") # pyright: ignore[reportPrivateUsage]
+ doc_id = store._get_document_id(key="legacy_key") # pyright: ignore[reportPrivateUsage]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| index_name = store._get_index_name(collection="test") | |
| doc_id = store._get_document_id(key="legacy_key") | |
| index_name = store._get_index_name(collection="test") # pyright: ignore[reportPrivateUsage] | |
| doc_id = store._get_document_id(key="legacy_key") # pyright: ignore[reportPrivateUsage] |
🤖 Prompt for AI Agents
In
key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py
around lines 180 to 181, the two private method calls index_name =
store._get_index_name(collection="test") and doc_id =
store._get_document_id(key="legacy_key") are missing the pyright ignore
comments; append " # pyright: ignore[reportPrivateUsage]" to the end of each of
those lines to match the comments on lines 159-160, 205-206, and 226-227 and
suppress the type checker warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (1)
319-323: Use unsanitized collection name as dictionary key.Similar to the issue in
_delete_managed_entries, line 322 checks_collections_by_nameusingsanitized_collection, but the dictionary is keyed by unsanitized names (line 212). This will fail to find and delete the cached entry.Apply this diff:
sanitized_collection = self._sanitize_collection(collection=collection) _ = self._db.drop_collection(name_or_collection=sanitized_collection) - if sanitized_collection in self._collections_by_name: - del self._collections_by_name[sanitized_collection] + if collection in self._collections_by_name: + del self._collections_by_name[collection] return Truekey-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (1)
66-66: Provide a regex pattern forallowed_characters.
HybridSanitizationStrategyexpectsallowed_charactersto be a bracketed regex pattern (e.g.,r"[a-z0-9_.-]"), not a plain string. The current plain string will be incorrectly processed, potentially dropping valid characters.Apply this diff:
-ALLOWED_INDEX_CHARACTERS: str = LOWERCASE_ALPHABET + NUMBERS + "_" + "-" + "." +ALLOWED_INDEX_CHARACTERS: str = r"[a-z0-9_.\-]"key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
334-340: Fix dictionary key inconsistency in cache cleanup.Lines 338-339 use
sanitized_collectionto check and delete from_collections_by_name, but this dictionary is keyed by unsanitized collection names. The check will fail to find the entry, leaving stale references in the cache. If the check somehow passes, the deletion would cause aKeyError.Apply this diff:
sanitized_collection = self._sanitize_collection(collection=collection) _ = await self._db.drop_collection(name_or_collection=sanitized_collection) - if sanitized_collection in self._collections_by_name: - del self._collections_by_name[sanitized_collection] + if collection in self._collections_by_name: + del self._collections_by_name[collection] return True
199-216: Let me check what the sanitization strategy actually does to collection names:Use sanitized collection name in the filter to match how collections are created in MongoDB.
The filter on line 201 searches for collections using the unsanitized collection name, but collections are created on line 208 using the sanitized name. If the sanitization strategy modifies collection names to comply with MongoDB naming rules, the filter will never find existing collections, causing unnecessary re-creation attempts.
Change:
collection_filter: dict[str, str] = {"name": collection}to:
collection_filter: dict[str, str] = {"name": sanitized_collection}This applies to both:
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(line 201)key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py(line 208)
♻️ Duplicate comments (4)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (1)
185-190: This is a duplicate of a previous critical review comment. Theallowed_charactersparameter should be a regex pattern (e.g.,r"[a-zA-Z0-9_]"), not a plain string.HybridSanitizationStrategy.sanitize()expects a bracketed regex and will slice off the first and last characters ofCOLLECTION_ALLOWED_CHARACTERS, corrupting legitimate collection names.key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (1)
210-214: This is a duplicate of a previous critical review comment. Without callingvalidate()beforesanitize(), users can submit keys or collections starting with reserved prefixes (H_orS_) that collide with sanitized outputs, defeating the collision protection. Please validate before sanitizing in both methods.key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
178-183: The previously flagged issue withallowed_charactershas not been addressed.The past review correctly identified that passing
COLLECTION_ALLOWED_CHARACTERS(a literal character string) toHybridSanitizationStrategywill cause the first and last characters to be dropped when the strategy slicesallowed_characters[1:-1]. Valid collection names starting with"A"or ending with"_"will be incorrectly sanitized, breaking lookups for existing collections.Apply the suggested fix from the previous review:
- collection_sanitization_strategy=HybridSanitizationStrategy( - replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS - ), + collection_sanitization_strategy=HybridSanitizationStrategy( + replacement_character="_", + max_length=MAX_COLLECTION_LENGTH, + allowed_characters=r"[a-zA-Z0-9_]", + ),key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
243-247: Call validate() before sanitizing.The validation methods must be invoked before sanitization to prevent user input starting with reserved prefixes from bypassing collision guards. Without validation, inputs beginning with
H_orS_can create namespace conflicts with sanitized identifiers.Apply this diff:
def _get_index_name(self, collection: str) -> str: + self._collection_sanitization.validate(value=collection) return self._index_prefix + "-" + self._collection_sanitization.sanitize(value=collection).lower() def _get_document_id(self, key: str) -> str: + self._key_sanitization.validate(value=key) return self._key_sanitization.sanitize(value=key)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(12 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(8 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py(12 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (7)
AlwaysHashStrategy(104-128)HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-shared/src/key_value/shared/utils/serialization.py (3)
SerializationAdapter(38-139)load_dict(67-106)dump_dict(114-132)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (4)
ElasticsearchSerializationAdapter(89-132)_get_index_name(243-244)_get_document_id(246-247)_get_destination(249-253)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (4)
_sanitize_collection(139-141)keys(386-392)_delete_managed_entry(336-338)_delete_managed_entries(340-349)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
_delete_managed_entry(309-311)_delete_managed_entries(314-323)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
_sanitize_collection(133-135)keys(414-420)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (7)
AlwaysHashStrategy(104-128)HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-shared/src/key_value/shared/utils/serialization.py (3)
SerializationAdapter(38-139)load_dict(67-106)dump_dict(114-132)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (4)
ElasticsearchSerializationAdapter(69-112)_get_index_name(210-211)_get_document_id(213-214)_get_destination(216-220)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
| allowed_characters=ALLOWED_INDEX_CHARACTERS, | ||
| ) | ||
| def _get_index_name(self, collection: str) -> str: | ||
| return self._index_prefix + "-" + self._collection_sanitization.sanitize(value=collection).lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .lower() call undermines reserved prefix protection.
The design has a flaw: HybridSanitizationStrategy adds the S_ prefix (uppercase) and validates against uppercase H_/S_, but the .lower() call converts S_ to s_. Since ALLOWED_INDEX_CHARACTERS permits lowercase letters and underscores, user input like "s_mycollection" passes through unmodified and is stored with an s_ prefix—the same prefix used by sanitized values after lowercasing. This creates a collision risk where user-provided names can conflict with system-sanitized names.
Consider one of these solutions:
- Move
.lower()before sanitization so the sanitizer operates on lowercase input and can properly detect/prevents_prefixes - Update validation to reject both uppercase and lowercase reserved prefixes (
H_,h_,S_,s_) - Remove
.lower()and ensure the sanitizer produces lowercase output directly
Option 1 (recommended): Apply this diff to lowercase before sanitization:
def _get_index_name(self, collection: str) -> str:
self._collection_sanitization.validate(value=collection)
- return self._index_prefix + "-" + self._collection_sanitization.sanitize(value=collection).lower()
+ return self._index_prefix + "-" + self._collection_sanitization.sanitize(value=collection.lower())🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
around line 244, the code appends .lower() after sanitization which converts the
sanitizer's reserved "S_" prefix into "s_" and allows user inputs like
"s_mycollection" to collide with sanitized names; fix by lowercasing the
collection name before passing it to the sanitization strategy (move .lower() to
operate on the raw collection string), or alternatively implement one of the
other suggested fixes (validate both uppercase and lowercase reserved prefixes
or remove lowercasing and have the sanitizer produce lowercase itself);
recommended: apply the first option—call collection = collection.lower() before
sanitize(value=collection) so the sanitizer can detect and reject reserved
prefixes correctly.
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py
Outdated
Show resolved
Hide resolved
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py
Outdated
Show resolved
Hide resolved
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (1)
316-321: Drop the unsanitized mapping entry.
_collections_by_nameis keyed by the original collection string. Deleting with the sanitized name leaves the stale entry in place, so our internal cache never reflects the drop. Remove the unsanitized key instead.- if sanitized_collection in self._collections_by_name: - del self._collections_by_name[sanitized_collection] + if collection in self._collections_by_name: + del self._collections_by_name[collection]key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
332-337: Remove the raw collection entry after dropping.Our cache uses the unsanitized key; deleting with
sanitized_collectionleaves a stale handle behind. Clear the mapping with the original collection string.- if sanitized_collection in self._collections_by_name: - del self._collections_by_name[sanitized_collection] + if collection in self._collections_by_name: + del self._collections_by_name[collection]
♻️ Duplicate comments (3)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (1)
187-190: Provide a regex forallowed_characters.
HybridSanitizationStrategyslicesallowed_characters[1:-1]when building its negated class. Passing the raw literal drops the first and last symbols ("A"and"_"), which forces valid names through sanitization and breaks lookups for existing collections. Please supply a bracketed character-class regex instead.- collection_sanitization_strategy=HybridSanitizationStrategy( - replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS - ), + collection_sanitization_strategy=HybridSanitizationStrategy( + replacement_character="_", + max_length=MAX_COLLECTION_LENGTH, + allowed_characters=r"[A-Za-z0-9_]", + ),key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
178-183: Provide a regex forallowed_characters.
HybridSanitizationStrategyexpects a bracketed regex; giving it the raw literal trims the first/last symbols, so valid collections get rewritten and become unreachable. Please pass a proper character class.- super().__init__( - default_collection=default_collection, - collection_sanitization_strategy=HybridSanitizationStrategy( - replacement_character="_", max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS - ), - ) + super().__init__( + default_collection=default_collection, + collection_sanitization_strategy=HybridSanitizationStrategy( + replacement_character="_", + max_length=MAX_COLLECTION_LENGTH, + allowed_characters=r"[A-Za-z0-9_]", + ), + )key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
248-250: Lowercase before sanitizing so uppercase letters aren’t lost.With the current order, uppercase characters are treated as invalid and replaced with
_before we lowercase the result, so"OrdersEU"becomess__rders_eu-…. Lowercase the collection name first (while still validating the original input) and then sanitize, finally lowercasing the combined index for Elasticsearch.- return self._index_prefix + "-" + self._sanitize_collection(collection=collection).lower() + self._collection_sanitization_strategy.validate(value=collection) + sanitized = self._collection_sanitization_strategy.sanitize(value=collection.lower()) + return f"{self._index_prefix}-{sanitized}".lower()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(12 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(8 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py(12 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
_sanitize_collection(133-135)keys(414-420)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (8)
AlwaysHashStrategy(104-128)HashFragmentMode(20-30)HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
ElasticsearchSerializationAdapter(89-132)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (3)
_sanitize_collection(139-141)_sanitize_key(143-145)keys(386-392)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (5)
_sanitize_collection(139-141)keys(386-392)_delete_managed_entry(336-338)_delete_managed_entries(340-349)delete_many(361-366)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
_delete_managed_entry(309-311)_delete_managed_entries(314-320)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (3)
AlwaysHashStrategy(104-128)HashFragmentMode(20-30)HybridSanitizationStrategy(182-293)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (1)
ElasticsearchSerializationAdapter(69-112)key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
_sanitize_collection(133-135)_sanitize_key(137-139)keys(414-420)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-aio)
| self._serializer = ElasticsearchSerializationAdapter(native_storage=native_storage) | ||
| collection_sanitization = HybridSanitizationStrategy( | ||
| replacement_character="_", | ||
| max_length=max_index_length, | ||
| allowed_characters=ALLOWED_INDEX_CHARACTERS, | ||
| hash_fragment_mode=HashFragmentMode.ALWAYS, | ||
| ) | ||
| key_sanitization = AlwaysHashStrategy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a regex character class for the hybrid strategy.
Providing the literal string causes the strategy to drop its first/last symbols when building the negated set, so valid index names get rewritten. Feed a bracketed regex instead.
- collection_sanitization = HybridSanitizationStrategy(
- replacement_character="_",
- max_length=max_index_length,
- allowed_characters=ALLOWED_INDEX_CHARACTERS,
- hash_fragment_mode=HashFragmentMode.ALWAYS,
- )
+ collection_sanitization = HybridSanitizationStrategy(
+ replacement_character="_",
+ max_length=max_index_length,
+ allowed_characters=r"[a-z0-9_.-]",
+ hash_fragment_mode=HashFragmentMode.ALWAYS,
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self._serializer = ElasticsearchSerializationAdapter(native_storage=native_storage) | |
| collection_sanitization = HybridSanitizationStrategy( | |
| replacement_character="_", | |
| max_length=max_index_length, | |
| allowed_characters=ALLOWED_INDEX_CHARACTERS, | |
| hash_fragment_mode=HashFragmentMode.ALWAYS, | |
| ) | |
| key_sanitization = AlwaysHashStrategy() | |
| self._serializer = ElasticsearchSerializationAdapter(native_storage=native_storage) | |
| collection_sanitization = HybridSanitizationStrategy( | |
| replacement_character="_", | |
| max_length=max_index_length, | |
| allowed_characters=r"[a-z0-9_.-]", | |
| hash_fragment_mode=HashFragmentMode.ALWAYS, | |
| ) | |
| key_sanitization = AlwaysHashStrategy() |
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
around lines 218 to 225, the HybridSanitizationStrategy is being passed a
literal string for allowed_characters which causes the negated character set to
be built incorrectly and strips valid index name characters; replace the literal
string with a bracketed regex character class (e.g. a string that starts with
'[' and ends with ']') for allowed_characters so the strategy treats it as a
character class, ensure any special characters like '-' or ']' are properly
escaped inside the class, and keep the rest of the HybridSanitizationStrategy
arguments unchanged.
| sanitized_collection = self._sanitize_collection(collection=collection) | ||
|
|
||
| collection_filter: dict[str, str] = {"name": collection} | ||
| matching_collections: list[str] = await self._db.list_collection_names(filter=collection_filter) | ||
|
|
||
| if matching_collections: | ||
| self._collections_by_name[collection] = self._db[collection] | ||
| self._collections_by_name[collection] = self._db[sanitized_collection] | ||
| return | ||
|
|
||
| new_collection: AsyncCollection[dict[str, Any]] = await self._db.create_collection(name=collection) | ||
| new_collection: AsyncCollection[dict[str, Any]] = await self._db.create_collection(name=sanitized_collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for existing collections using the sanitized name.
When a collection needs sanitization, list_collection_names with the raw name never finds it and we try to recreate the collection, which will fail once it already exists. Filter by the sanitized name instead.
- collection_filter: dict[str, str] = {"name": collection}
+ collection_filter: dict[str, str] = {"name": sanitized_collection}
matching_collections: list[str] = await self._db.list_collection_names(filter=collection_filter)🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py around
lines 199 to 208, the code checks for existing collections using the unsanitized
collection name which will miss collections that were created with the sanitized
name; change the filter passed to list_collection_names to use the
sanitized_collection variable (i.e., filter by {"name": sanitized_collection})
so the existence check matches the actual stored collection name, then proceed
to reuse or create the collection as before.
| self._serializer = ElasticsearchSerializationAdapter(native_storage=native_storage) | ||
| collection_sanitization = HybridSanitizationStrategy( | ||
| replacement_character="_", | ||
| max_length=max_index_length, | ||
| allowed_characters=ALLOWED_INDEX_CHARACTERS, | ||
| hash_fragment_mode=HashFragmentMode.ALWAYS, | ||
| ) | ||
| key_sanitization = AlwaysHashStrategy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass a character-class regex to the hybrid strategy.
Because the strategy slices allowed_characters[1:-1], handing it the literal string drops the first ("a") and last (".") characters from the whitelist. That causes harmless names to be sanitized, changing existing index names. Supply a bracketed regex instead.
- collection_sanitization = HybridSanitizationStrategy(
- replacement_character="_",
- max_length=max_index_length,
- allowed_characters=ALLOWED_INDEX_CHARACTERS,
- hash_fragment_mode=HashFragmentMode.ALWAYS,
- )
+ collection_sanitization = HybridSanitizationStrategy(
+ replacement_character="_",
+ max_length=max_index_length,
+ allowed_characters=r"[a-z0-9_.-]",
+ hash_fragment_mode=HashFragmentMode.ALWAYS,
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self._serializer = ElasticsearchSerializationAdapter(native_storage=native_storage) | |
| collection_sanitization = HybridSanitizationStrategy( | |
| replacement_character="_", | |
| max_length=max_index_length, | |
| allowed_characters=ALLOWED_INDEX_CHARACTERS, | |
| hash_fragment_mode=HashFragmentMode.ALWAYS, | |
| ) | |
| key_sanitization = AlwaysHashStrategy() | |
| self._serializer = ElasticsearchSerializationAdapter(native_storage=native_storage) | |
| collection_sanitization = HybridSanitizationStrategy( | |
| replacement_character="_", | |
| max_length=max_index_length, | |
| allowed_characters=r"[a-z0-9_.-]", | |
| hash_fragment_mode=HashFragmentMode.ALWAYS, | |
| ) | |
| key_sanitization = AlwaysHashStrategy() |
🤖 Prompt for AI Agents
In
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py
around lines 187-194, the HybridSanitizationStrategy is being passed
ALLOWED_INDEX_CHARACTERS as a literal string which gets sliced
(allowed_characters[1:-1]) and drops the first and last characters; change the
argument to a bracketed character-class regex (e.g. wrap the allowed characters
in square brackets so the strategy receives a regex like "[...]" or otherwise
ensure the first and last characters are preserved) so the sanitization uses the
intended whitelist and existing index names are not altered.
| def _get_index_name(self, collection: str) -> str: | ||
| return self._index_prefix + "-" + self._sanitize_collection(collection=collection).lower() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lowercase before sanitizing to keep letters instead of replacing them.
Currently we sanitize first and lowercase afterwards. If the input contains uppercase letters (e.g. "OrdersEU"), the hybrid strategy replaces each one with "_" because uppercase characters aren’t in ALLOWED_INDEX_CHARACTERS, so the index name becomes s__rders_eu-…. Lowercasing before sanitization preserves the letters while still producing a valid all-lowercase index name.
- return self._index_prefix + "-" + self._sanitize_collection(collection=collection).lower()
+ self._collection_sanitization_strategy.validate(value=collection)
+ sanitized = self._collection_sanitization_strategy.sanitize(value=collection.lower())
+ return f"{self._index_prefix}-{sanitized}".lower()🤖 Prompt for AI Agents
In
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py
around lines 217 to 219, the current code sanitizes the collection name before
lowercasing which causes uppercase letters to be replaced by underscores; change
the order so the collection string is lowercased first and then passed to
_sanitize_collection (i.e., call
self._sanitize_collection(collection=collection.lower())), ensuring the
resulting index name is valid and preserves letters while remaining
all-lowercase.
| sanitized_collection = self._sanitize_collection(collection=collection) | ||
|
|
||
| collection_filter: dict[str, str] = {"name": collection} | ||
| matching_collections: list[str] = self._db.list_collection_names(filter=collection_filter) | ||
|
|
||
| if matching_collections: | ||
| self._collections_by_name[collection] = self._db[collection] | ||
| self._collections_by_name[collection] = self._db[sanitized_collection] | ||
| return | ||
|
|
||
| new_collection: Collection[dict[str, Any]] = self._db.create_collection(name=collection) | ||
| new_collection: Collection[dict[str, Any]] = self._db.create_collection(name=sanitized_collection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the sanitized name when checking for existing collections.
list_collection_names must look for the stored (sanitized) collection name. With the current filter, any collection that requires sanitization is never detected and we attempt to recreate it, triggering CollectionInvalid once it already exists.
- collection_filter: dict[str, str] = {"name": collection}
+ collection_filter: dict[str, str] = {"name": sanitized_collection}
matching_collections: list[str] = self._db.list_collection_names(filter=collection_filter)🤖 Prompt for AI Agents
In key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py
around lines 206 to 215, the code checks for existing collections using the
unsanitized collection name which causes collections that were stored under a
sanitized name to be missed and recreated; change the filter to use the
sanitized_collection variable (i.e., construct collection_filter with the
sanitized name) so list_collection_names looks up the actual stored name, then
proceed to use that sanitized_collection for both the _collections_by_name
assignment and create_collection call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (3)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (1)
168-179: Reject lowercase reserved prefixes as well.Stores such as the async Elasticsearch backend lowercase sanitized names after
sanitize(). With the currentstartswith(("H_", "S_"))check, a user can passs_.../h_...and bypass the guard, colliding with lowered sanitized keys. Normalize the first two characters before comparison (and make the same change inHybridSanitizationStrategy.validate).- if value.startswith(("H_", "S_")): + if value[:2].upper() in {"H_", "S_"}: msg = f"Keys cannot start with reserved prefixes 'H_' or 'S_': {value}" raise InvalidKeyError(msg)Apply the identical update in
HybridSanitizationStrategy.validate.Also applies to: 291-293
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (1)
203-210: Fix collection existence check to use sanitized name.When a collection needs sanitization,
list_collection_names(filter={"name": collection})never finds it. The second time_setup_collectionruns (e.g., new process hitting an existing DB), we try to recreate the sanitized namespace and MongoDB raisesCollectionInvalid. Use the sanitized name in the filter so we detect the already-created collection.- collection_filter: dict[str, str] = {"name": collection} + collection_filter: dict[str, str] = {"name": sanitized_collection} matching_collections: list[str] = self._db.list_collection_names(filter=collection_filter) if matching_collections: - self._collections_by_name[collection] = self._db[sanitized_collection] + self._collections_by_name[collection] = self._db[sanitized_collection] returnkey-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
201-206: Use the sanitized collection name when checking for existence.
HybridSanitizationStrategywithHashFragmentMode.ALWAYSguarantees that the stored collection is named with the sanitized form (e.g.S_…). Queryinglist_collection_nameswith the raw name always returns an empty list, so every call drops intocreate_collection, which then raises because the sanitized collection already exists. Please reusesanitized_collectionin the filter before deciding to create the collection.- collection_filter: dict[str, str] = {"name": collection} + collection_filter: dict[str, str] = {"name": sanitized_collection}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(12 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(8 hunks)key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py(5 hunks)key-value/key-value-aio/tests/stores/mongodb/test_mongodb.py(5 hunks)key-value/key-value-shared/src/key_value/shared/utils/sanitization.py(1 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py(12 hunks)key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py(8 hunks)key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py(5 hunks)key-value/key-value-sync/tests/code_gen/stores/mongodb/test_mongodb.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py (1)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
_get_index_name(252-253)_get_document_id(255-256)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (2)
key-value/key-value-shared/src/key_value/shared/errors/key_value.py (1)
InvalidKeyError(38-39)key-value/key-value-shared/src/key_value/shared/utils/sanitize.py (1)
sanitize_characters_in_string(34-61)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (5)
BaseStore(60-376)_sanitize_collection(139-141)keys(386-392)_delete_managed_entry(336-338)_delete_managed_entries(340-349)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (6)
HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
BaseStore(54-404)_sanitize_collection(133-135)keys(414-420)
key-value/key-value-sync/tests/code_gen/stores/elasticsearch/test_elasticsearch.py (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
_get_index_name(252-253)_get_document_id(255-256)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (2)
_get_index_name(220-221)_get_document_id(223-224)
key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (4)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (8)
AlwaysHashStrategy(104-128)HashFragmentMode(20-30)HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-shared/src/key_value/shared/utils/serialization.py (3)
SerializationAdapter(38-139)load_dict(67-106)dump_dict(114-132)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (4)
ElasticsearchSerializationAdapter(90-133)_get_index_name(252-253)_get_document_id(255-256)_get_destination(258-262)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (3)
_sanitize_collection(139-141)_sanitize_key(143-145)keys(386-392)
key-value/key-value-aio/tests/stores/mongodb/test_mongodb.py (1)
key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
collections(455-459)_sanitize_collection(133-135)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (4)
key-value/key-value-shared/src/key_value/shared/utils/sanitization.py (8)
AlwaysHashStrategy(104-128)HashFragmentMode(20-30)HybridSanitizationStrategy(182-293)sanitize(45-53)sanitize(88-90)sanitize(119-121)sanitize(150-166)sanitize(221-280)key-value/key-value-shared/src/key_value/shared/utils/serialization.py (3)
SerializationAdapter(38-139)load_dict(67-106)dump_dict(114-132)key-value/key-value-sync/src/key_value/sync/code_gen/stores/elasticsearch/store.py (4)
ElasticsearchSerializationAdapter(69-112)_get_index_name(220-221)_get_document_id(223-224)_get_destination(226-230)key-value/key-value-aio/src/key_value/aio/stores/base.py (3)
_sanitize_collection(133-135)_sanitize_key(137-139)keys(414-420)
key-value/key-value-sync/tests/code_gen/stores/mongodb/test_mongodb.py (2)
key-value/key-value-aio/src/key_value/aio/stores/base.py (2)
collections(455-459)_sanitize_collection(133-135)key-value/key-value-sync/src/key_value/sync/code_gen/stores/base.py (2)
collections(425-429)_sanitize_collection(139-141)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-14, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, windows-2022, key-value/key-value-aio)
- GitHub Check: test_quick (3.10, ubuntu-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, macos-latest, key-value/key-value-sync)
- GitHub Check: test_quick (3.10, ubuntu-22.04, key-value/key-value-aio)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py
Outdated
Show resolved
Hide resolved
key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py
Outdated
Show resolved
Hide resolved
| index_name = store._get_index_name(collection="test") | ||
| doc_id = store._get_document_id(key="legacy_key") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add pyright suppression for private helpers.
Every other direct call to these private helpers carries # pyright: ignore[reportPrivateUsage] so static checks stay green. Please add the same inline suppression here to keep pyright happy.
🤖 Prompt for AI Agents
In key-value/key-value-aio/tests/stores/elasticsearch/test_elasticsearch.py
around lines 238 to 239, the test calls private helper methods
store._get_index_name(...) and store._get_document_id(...) without the inline
pyright suppression; add the same inline comment used elsewhere — append " #
pyright: ignore[reportPrivateUsage]" to each of those two lines so static checks
ignore the private usage.
key-value/key-value-sync/src/key_value/sync/code_gen/stores/mongodb/store.py
Show resolved
Hide resolved
|


Summary
Implements the
SanitizationStrategyadapter pattern andSanitizationMigrationWrapperto enable gradual migration between different key sanitization strategies without breaking access to existing data.Changes
Created
SanitizationStrategyABC with 3 implementations:NoOpSanitizationStrategy: pass-through (no sanitization)HashLongKeysSanitizationStrategy: SHA256 hashing with H_ prefixCharacterSanitizationStrategy: character replacement with S_ prefixImplemented
SanitizationMigrationWrapperfor gradual strategy migration:Added
InvalidKeyErrorfor validation failuresComprehensive test coverage:
Key Features
Related Issue
Resolves #207
Test Plan
🤖 Generated with Claude Code) | View branch | [View job run](https://github.com/strawgate/py-key-value/actions/runs/19053651545
Summary by CodeRabbit
New Features
Improvements
Tests