-
Notifications
You must be signed in to change notification settings - Fork 2
feat: Switch MongoDB and Elasticsearch to native document storage #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MongoDB changes: - Store values as BSON documents instead of JSON strings - Add TTL index for automatic expiration - Store dates as datetime objects instead of ISO strings Elasticsearch changes: - Use flattened field type instead of disabled keyword - Store values as objects instead of JSON strings BREAKING CHANGE: This changes the storage format for both MongoDB and Elasticsearch stores. Existing data will need to be migrated. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: William Easton <[email protected]>
📝 WalkthroughWalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Store as KeyValueStore
participant Backend as DB (Elasticsearch / MongoDB)
Note over Store,Backend #F3F8FF: Behavior toggles on `native_storage` flag
Client->>Store: put(key, ManagedEntry)
alt native_storage == true
Store->>Backend: persist value as native object (flattened dict / BSON) + created_at/updated_at/expires_at
Backend-->>Store: ack
else native_storage == false
Store->>Backend: persist value as JSON string + top-level timestamps
Backend-->>Store: ack
end
Store-->>Client: put result
Client->>Store: get(key)
Store->>Backend: fetch document by key
alt native_storage == true
Backend-->>Store: doc with native value (dict/BSON) and datetime fields
Store-->>Client: ManagedEntry(value=dict, datetime fields)
else native_storage == false
Backend-->>Store: doc with JSON string value and timestamp fields
Store->>Store: parse JSON -> ManagedEntry
Store-->>Client: ManagedEntry(parsed value, timestamps)
end
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
204-207: Inconsistency: Store dates as datetime objects instead of ISO strings.The code stores
created_atandexpires_atas ISO strings, while the MongoDB store (in this same PR) stores them as native datetime objects. Elasticsearch supports native date types (as evidenced by the mapping at lines 45-49), so you should store datetime objects directly for consistency.Apply this diff to align with MongoDB's approach:
- if managed_entry.created_at: - document["created_at"] = managed_entry.created_at.isoformat() - if managed_entry.expires_at: - document["expires_at"] = managed_entry.expires_at.isoformat() + if managed_entry.created_at: + document["created_at"] = managed_entry.created_at + if managed_entry.expires_at: + document["expires_at"] = managed_entry.expires_atNote: If you make this change, you'll also need to update
_get_managed_entry(lines 175-176) to read datetime objects directly instead of parsing ISO strings:- created_at: datetime | None = try_parse_datetime_str(value=source.get("created_at")) - expires_at: datetime | None = try_parse_datetime_str(value=source.get("expires_at")) + created_at: datetime | None = source.get("created_at") + expires_at: datetime | None = source.get("expires_at")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(4 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (1)
ManagedEntry(13-83)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
try_parse_datetime_str(36-43)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (1)
ManagedEntry(13-83)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
now(21-23)
🔇 Additional comments (4)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
57-59: LGTM! Appropriate use of flattened field type.The switch to
flattenedtype is appropriate for storing arbitrary JSON objects in a key-value store. This allows Elasticsearch to index the entire object while treating subfields as keywords.
171-182: LGTM! Correctly handles flattened object retrieval.The logic properly reads the value as a dict from the flattened field and validates its type before constructing the ManagedEntry.
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (2)
137-138: LGTM! TTL index correctly configured for automatic expiration.The TTL index with
expireAfterSeconds=0will cause MongoDB to automatically delete documents at the time specified inexpires_at. Note that MongoDB's TTL background thread runs approximately every 60 seconds, so expired documents may persist briefly after their expiration time.
177-194: LGTM! Proper use of native BSON types.The implementation correctly:
- Stores
valueas a BSON document (dict) instead of a JSON string- Stores
created_atandexpires_atas native datetime objects- Adds an
updated_atfield for tracking modificationsThis aligns well with the PR objectives to use native document storage.
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py
Outdated
Show resolved
Hide resolved
|
@claude I think we'll need a mode on these where the default is to store as string but storing as a native object is an option of the store constructor We'll need a warning that switching can not be done transparently ( it may require resetting the store). The reason to do this is to make objects queriable in the relevant store but carries with it a risk that store specific issues may prevent storage of some values |
|
Claude finished @strawgate's task —— View job Making native storage optional ✓I've successfully updated the implementation to make native storage an optional feature with backward compatibility. Changes MadeMongoDB Store (
Elasticsearch Store (
Key Points✅ Backward Compatible: Both stores default to JSON string storage (existing behavior) Usage Examples# Default: JSON string storage (backward compatible)
mongo_store = MongoDBStore(url="mongodb://localhost:27017")
# Opt-in to native BSON storage
mongo_store_native = MongoDBStore(url="mongodb://localhost:27017", native_storage=True)
# Default: JSON string storage (backward compatible)
es_store = ElasticsearchStore(url="https://localhost:9200", api_key="key", index_prefix="kv")
# Opt-in to flattened object storage
es_store_native = ElasticsearchStore(url="https://localhost:9200", api_key="key", index_prefix="kv", native_storage=True)Next StepsThe changes have been committed and pushed to the branch. You may want to:
--- • Branch |
Add `native_storage` constructor parameter to both stores: - Default (False): Use JSON string storage for backward compatibility - Enabled (True): Use native BSON/flattened storage This allows users to opt-in to native storage while keeping existing behavior as the default. Both modes are supported on the same store instance, but switching between modes requires data migration. MongoDB: - When native_storage=False: Store values as JSON strings - When native_storage=True: Store as BSON documents with TTL index Elasticsearch: - When native_storage=False: Use keyword mapping with JSON strings - When native_storage=True: Use flattened field type with objects Co-authored-by: William Easton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
43-61: Remove or update the unused DEFAULT_MAPPING constant.The
DEFAULT_MAPPINGconstant is never used in the code. The actual mapping is created dynamically in_setup_collection(lines 147-166) based on the_native_storageflag. This constant is misleading and could confuse developers.Consider removing it entirely:
-DEFAULT_MAPPING = { - "properties": { - "created_at": { - "type": "date", - }, - "expires_at": { - "type": "date", - }, - "collection": { - "type": "keyword", - }, - "key": { - "type": "keyword", - }, - "value": { - "type": "flattened", - }, - }, -} -
♻️ Duplicate comments (1)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
167-182: Type validation for datetime fields still missing.The code assumes
created_atandexpires_atare datetime objects without validation. This matches a previous review comment that suggested defensive type checks to handle unexpected data formats (e.g., if data was written in JSON mode or corrupted).Consider adding type validation:
# Parse datetime objects directly created_at: datetime | None = doc.get("created_at") expires_at: datetime | None = doc.get("expires_at") + + # Validate datetime types + if created_at is not None and not isinstance(created_at, datetime): + return None + if expires_at is not None and not isinstance(expires_at, datetime): + return None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(9 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(13-83)from_json(61-83)to_json(46-58)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
try_parse_datetime_str(36-43)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (3)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
get(105-131)key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(13-83)from_json(61-83)to_json(46-58)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
now(21-23)
🔇 Additional comments (4)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
84-84: LGTM!The
_native_storageattribute is properly declared with correct type annotation.
86-131: LGTM!The constructor properly accepts the
native_storageparameter across all overloads with sensible defaults and clear documentation about the migration warning.
146-168: LGTM!The dynamic mapping creation correctly handles both storage modes:
flattenedtype for native objects (queryable) andkeywordwithindex: Falsefor JSON strings (non-indexed).key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
152-154: TTL index behavior is correctly implemented for both storage modes.Web verification confirms that MongoDB TTL indexes work only on BSON Date fields, not on values stored inside JSON strings. Your implementation correctly reflects this:
- Native storage: Creates TTL index on
expires_at(BSON Date) → MongoDB automatically deletes expired documents.- JSON storage: Does not create TTL index (correct, since
expires_atis inside a JSON string where TTL cannot work) → Application filters expired entries at read time.Ensure the documentation clearly explains these different expiration semantics to users, particularly that JSON storage mode will accumulate expired entries requiring manual cleanup or application-level garbage collection.
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
Outdated
Show resolved
Hide resolved
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
Outdated
Show resolved
Hide resolved
|
@claude can you see the PR feedback? Can you incorporate? Also do the stores check if the index is setup wrong for native and throw a helpful error? |
|
Claude finished @strawgate's task —— View job Addressed CodeRabbit feedback and added index validation ✓All requested fixes have been implemented and pushed to the branch. Changes MadeElasticsearch Store (
|
- Elasticsearch: - Remove unused DEFAULT_MAPPING constant - Fix culling in JSON mode by storing timestamps at top level - Add storage mode mismatch detection with helpful error messages - Add index mapping validation on collection setup - MongoDB: - Add datetime type validation when reading in native mode - Add updated_at field to JSON mode for consistency - Remove redundant collection and key fields from documents - Use $setOnInsert for immutable created_at field - Add TTL index validation for native storage mode Both stores now validate their configuration matches the existing index/collection setup and raise helpful errors when mismatches occur. Co-authored-by: William Easton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
♻️ Duplicate comments (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
235-251: Mode-mismatch detection on read: resolved—good safeguardsRaising a clear ValueError when the stored format doesn’t match the configured mode prevents silent data loss. Nice.
286-304: JSON-mode culling fix: confirmedTop-level created_at/expires_at now written in JSON mode, so _cull’s range query works regardless of storage mode. Good improvement.
Please reindex or validate that existing indices have expires_at mapped as date; otherwise range queries could be skipped by ES. A quick smoke test against a test index after migration is recommended.
Also applies to: 400-412
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(8 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(13-83)from_json(61-83)to_json(46-58)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
try_parse_datetime_str(36-43)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (3)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
get(105-131)key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(13-83)from_json(61-83)to_json(46-58)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
now(21-23)
🔇 Additional comments (5)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
66-71: Native storage flag + warning: LGTMConstructor overloads and docstring warning are clear; setting self._native_storage is correct.
Also applies to: 80-92, 109-109
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (4)
49-120: LGTM: Constructor changes are well-documented and backward compatible.The addition of the
native_storageparameter with a default ofFalseensures backward compatibility. The documentation clearly warns about the breaking nature of switching modes, which addresses the PR objectives.
146-157: LGTM: Index setup correctly handles native storage mode.The conditional TTL index creation and validation for existing collections properly implements the requirement to ensure indexes match the storage mode.
206-218: Excellent type validation for datetime fields.The datetime type validation with clear error messages effectively detects storage mode mismatches, addressing the previous review feedback about defensive type checking.
Based on past review feedback.
244-280: Good: Previous review feedback properly addressed.The implementation correctly addresses all previous concerns:
- Both storage modes now track
updated_at- Redundant
collectionandkeyfields removed from$set- Proper use of
$setfor mutable fields and$setOnInsertfor immutablecreated_at
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
Outdated
Show resolved
Hide resolved
| if self._native_storage: | ||
| # Native storage mode: Store value as flattened object | ||
| document: dict[str, Any] = { | ||
| "collection": collection, | ||
| "key": key, | ||
| "value": managed_entry.value, # Store as flattened object | ||
| } | ||
|
|
||
| if managed_entry.created_at: | ||
| document["created_at"] = managed_entry.created_at.isoformat() | ||
| if managed_entry.expires_at: | ||
| document["expires_at"] = managed_entry.expires_at.isoformat() | ||
|
|
||
| _ = await self._client.index( | ||
| index=self._sanitize_index_name(collection=collection), | ||
| id=self._sanitize_document_id(key=combo_key), | ||
| body=document, | ||
| refresh=self._should_refresh_on_put, | ||
| ) | ||
| else: | ||
| # JSON string mode: Store value as JSON string | ||
| document: dict[str, Any] = { | ||
| "collection": collection, | ||
| "key": key, | ||
| "value": managed_entry.to_json(), # Store as JSON string | ||
| } | ||
|
|
||
| # Store timestamps at top level for culling to work | ||
| if managed_entry.created_at: | ||
| document["created_at"] = managed_entry.created_at.isoformat() | ||
| if managed_entry.expires_at: | ||
| document["expires_at"] = managed_entry.expires_at.isoformat() | ||
|
|
||
| _ = await self._client.index( | ||
| index=self._sanitize_index_name(collection=collection), | ||
| id=self._sanitize_document_id(key=combo_key), | ||
| body=document, | ||
| refresh=self._should_refresh_on_put, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Optional: write a small format marker to ease future migrations
Adding a lightweight marker (e.g., "format_version": 1 or "storage_mode": "native"|"json") would make migrations and mismatch detection explicit and cheaper than heuristics.
Example:
document: dict[str, Any] = {
"collection": collection,
"key": key,
"value": managed_entry.value, # Store as flattened object
+ "storage_mode": "native",
}And for JSON mode:
document: dict[str, Any] = {
"collection": collection,
"key": key,
"value": managed_entry.to_json(), # Store as JSON string
+ "storage_mode": "json",
}Then assert this in reads to give even clearer errors.
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
around lines 266 to 304, the code stores documents in two modes (native vs JSON)
but lacks an explicit format marker; add a lightweight marker field to each
indexed document (e.g., "format_version": 1 or "storage_mode": "native"/"json")
inside both branches so future migrations/mismatch detection are trivial, and
update read logic to assert/validate this marker and raise a clear error when it
doesn't match the expected marker.
| if self._native_storage: | ||
| # Native storage mode: Store value as flattened object | ||
| document: dict[str, Any] = { | ||
| "collection": collection, | ||
| "key": key, | ||
| "value": managed_entry.value, # Store as flattened object | ||
| } | ||
|
|
||
| if managed_entry.created_at: | ||
| document["created_at"] = managed_entry.created_at.isoformat() | ||
| if managed_entry.expires_at: | ||
| document["expires_at"] = managed_entry.expires_at.isoformat() | ||
|
|
||
| _ = await self._client.index( | ||
| index=self._sanitize_index_name(collection=collection), | ||
| id=self._sanitize_document_id(key=combo_key), | ||
| body=document, | ||
| refresh=self._should_refresh_on_put, | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Timestamp serialization: consider normalizing to milliseconds
Elasticsearch date fields accept epoch_millis or ISO 8601. Using isoformat() may include microseconds; parsers generally accept it, but normalizing to milliseconds avoids surprises and reduces payload size.
Option:
- if managed_entry.created_at:
- document["created_at"] = managed_entry.created_at.isoformat()
- if managed_entry.expires_at:
- document["expires_at"] = managed_entry.expires_at.isoformat()
+ if managed_entry.created_at:
+ document["created_at"] = managed_entry.created_at.isoformat(timespec="milliseconds")
+ if managed_entry.expires_at:
+ document["expires_at"] = managed_entry.expires_at.isoformat(timespec="milliseconds")If your supported Python versions lack timespec, keep ISO or switch to epoch millis consistently in both write and cull paths.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self._native_storage: | |
| # Native storage mode: Store value as flattened object | |
| document: dict[str, Any] = { | |
| "collection": collection, | |
| "key": key, | |
| "value": managed_entry.value, # Store as flattened object | |
| } | |
| if managed_entry.created_at: | |
| document["created_at"] = managed_entry.created_at.isoformat() | |
| if managed_entry.expires_at: | |
| document["expires_at"] = managed_entry.expires_at.isoformat() | |
| _ = await self._client.index( | |
| index=self._sanitize_index_name(collection=collection), | |
| id=self._sanitize_document_id(key=combo_key), | |
| body=document, | |
| refresh=self._should_refresh_on_put, | |
| ) | |
| else: | |
| if self._native_storage: | |
| # Native storage mode: Store value as flattened object | |
| document: dict[str, Any] = { | |
| "collection": collection, | |
| "key": key, | |
| "value": managed_entry.value, # Store as flattened object | |
| } | |
| if managed_entry.created_at: | |
| document["created_at"] = managed_entry.created_at.isoformat(timespec="milliseconds") | |
| if managed_entry.expires_at: | |
| document["expires_at"] = managed_entry.expires_at.isoformat(timespec="milliseconds") | |
| _ = await self._client.index( | |
| index=self._sanitize_index_name(collection=collection), | |
| id=self._sanitize_document_id(key=combo_key), | |
| body=document, | |
| refresh=self._should_refresh_on_put, | |
| ) | |
| else: |
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
around lines 266 to 285, the code serializes managed_entry.created_at and
managed_entry.expires_at with isoformat(), which can include microseconds and
vary across runtimes; normalize timestamps to epoch milliseconds instead to be
consistent, smaller and Elasticsearch-friendly: replace created_at.isoformat()
and expires_at.isoformat() with
int(created_at.astimezone(datetime.timezone.utc).timestamp() * 1000) (handle
naive datetimes by assuming UTC or using .replace(tzinfo=timezone.utc)); ensure
the same epoch-millis serialization is applied in the non-native/cull paths as
well so both write and cull use consistent timestamp format.
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py
Outdated
Show resolved
Hide resolved
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py
Outdated
Show resolved
Hide resolved
|
@claude can you see the PR feedback? Can you address it? |
|
Claude finished @strawgate's task —— View job Addressed all CodeRabbit review feedback ✓I've successfully implemented fixes for all 8 actionable issues from the CodeRabbit review (2025-10-26T02:53:59Z). Changes MadeElasticsearch Store (
|
- Elasticsearch: disable doc_values for JSON mode, enhance mapping validation, fix empty dict handling - MongoDB: add inverse TTL validation, explicit type errors, fix expires_at=None bug - Fix line length issues and apply linting suggestions Co-authored-by: William Easton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
36-40: Consider removing or updating the unused TypedDict.The
MongoDBStoreDocumentTypedDict definesvalue: dict[str, Any], but this only accurately represents native storage mode. In JSON string mode, thevaluefield is stored as astr. Since this type is not used anywhere in the code, consider either removing it or splitting it into two variants (one for each storage mode) to avoid confusion.Apply this diff to remove the unused TypedDict:
-class MongoDBStoreDocument(TypedDict): - value: dict[str, Any] - - created_at: datetime | None - expires_at: datetime | None - -Alternatively, if you want to keep it for documentation, split it into mode-specific variants:
-class MongoDBStoreDocument(TypedDict): +class MongoDBNativeDocument(TypedDict): value: dict[str, Any] + created_at: datetime | None + expires_at: datetime | None + updated_at: datetime - created_at: datetime | None - expires_at: datetime | None - +class MongoDBJSONDocument(TypedDict): + value: str # JSON string containing value and metadata + updated_at: datetimekey-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
427-438: Optional: Consider normalizing timestamp format across write and query paths.The culling logic works correctly because Elasticsearch's
datefield type handles both ISO strings (stored viaisoformat()at lines 301, 303, 321, 323) and epoch milliseconds (queried at line 434). However, mixing formats makes the code less obvious.For consistency and clarity, consider normalizing to epoch milliseconds in both write and cull paths, as suggested in the previous review.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(8 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(14-84)from_json(62-84)to_json(47-59)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
try_parse_datetime_str(36-43)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (3)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
get(111-137)key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (2)
ManagedEntry(14-84)to_json(47-59)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
now(21-23)
🔇 Additional comments (12)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (6)
49-91: LGTM! Clear API design with appropriate warnings.The overloads correctly define the constructor variants, and the docstrings clearly warn users about the breaking nature of switching storage modes. The
_native_storagefield properly tracks the runtime storage mode.
93-126: LGTM! Constructor properly initializes storage mode.The implementation correctly stores the
native_storageflag and maintains backward compatibility by defaulting toFalse. The docstring appropriately emphasizes the migration requirement when switching modes.
143-164: LGTM! Collection setup properly handles both modes.The setup correctly:
- Validates indexes on existing collections to detect mismatches
- Creates TTL index only for native storage mode
- Uses
expireAfterSeconds=0for immediate expiration whenexpires_atis reached
166-198: LGTM! Thorough bidirectional index validation with helpful error messages.The validation correctly checks both scenarios:
- Missing TTL index when
native_storage=True- Present TTL index when
native_storage=FalseThe error messages are actionable and guide users to fix the mismatch. The broad exception suppression (lines 196-198) is probably intentional for resilience (e.g., permission issues), but be aware it could hide other unexpected errors beyond connection issues.
Optional: Consider logging suppressed exceptions for debugging:
except Exception: # noqa: S110 # Suppress other errors (e.g., connection issues) to allow store to work - pass + import logging + logging.debug(f"Index validation failed for collection '{collection}'", exc_info=True)
200-253: Excellent! Robust dual-path retrieval with explicit error handling.The implementation correctly:
- Retrieves values as BSON dicts in native mode and JSON strings in JSON mode
- Validates types at each step with informative error messages
- Handles datetime fields appropriately for each mode
- Raises
TypeErrorinstead of silently returningNonefor type mismatchesThis properly addresses previous feedback about explicit error handling. The error messages clearly indicate storage mode mismatches, which will help users identify configuration issues quickly.
255-306: Outstanding! Dual-path storage correctly implements all previous feedback.The implementation properly addresses all previous review concerns:
✅ Metadata consistency: Both modes now store
updated_at(lines 269, 299)✅ Redundancy removed: No longer stores
collectionor redundantkeyin the document✅ Correct MongoDB operators:
- Uses
$setfor mutable fields (value,updated_at,expires_at)- Uses
$setOnInsertfor immutablecreated_at(lines 275-281)- Uses
$unsetto remove expiration whenexpires_at=None(lines 287-288)✅ Clean upsert logic: The
keyappears only in the filter, letting MongoDB handle the upsert correctlyThis is a well-crafted implementation that handles both storage modes cleanly and correctly.
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (6)
64-65: LGTM! Storage mode flag added.The
_native_storageattribute clearly tracks the storage mode for the instance.
66-86: LGTM! Constructor overloads properly updated.Both constructor overloads correctly include the
native_storageparameter with appropriate defaults.
87-126: LGTM! Constructor implementation is complete and well-documented.The warning about switching storage modes requiring re-indexing is clear and helpful. The default of
Falsemaintains backward compatibility with existing deployments.
135-165: LGTM! Dynamic mapping with validation is well-implemented.The mapping validation on existing indices (line 140) ensures that storage mode mismatches are caught early. The conditional
doc_values: Falsefor keyword fields (line 160) correctly avoids columnar storage bloat in JSON mode.
223-276: LGTM! Dual-mode retrieval with excellent mismatch detection.The storage mode mismatch detection (lines 242-248, 265-271) provides clear, actionable error messages when data stored in one mode is accessed in another. The fix for empty dict handling (lines 237-239) correctly distinguishes between missing and falsy values.
283-330: LGTM! Dual-mode storage correctly implemented.The critical fix of storing timestamps at the top level in both modes (lines 300-303, 319-323) ensures culling works correctly. The dual-mode structure is clear and maintainable.
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
Outdated
Show resolved
Hide resolved
|
@claude can you fix the tests and address any remaining feedback |
|
Claude finished @strawgate's task —— View job Fixed tests and addressed remaining feedback ✅All tasks completed successfully!
Summary of ChangesType Checking Fixes (12 errors → 0 errors)Elasticsearch Store (
MongoDB Store ( CodeRabbit Feedback Addressed
Verification
Commit: |
- Fix type inference issues in Elasticsearch and MongoDB stores - Add proper type annotations and casts for dict types - Fix variable redeclarations in if/else branches - Fix MongoDB list_indexes() await pattern - Add exception logging to Elasticsearch validation warnings - Remove unused MongoDBStoreDocument TypedDict - All 12 type checking errors now resolved Co-authored-by: William Easton <[email protected]>
|
@claude for elasticsearch and mongo should we just store them in different columns so that it doesnt matter which we get we can deserialize it? and the setting just controls what happens on put? |
|
Claude finished @strawgate's task —— View job Analysis of Dual-Column Storage ApproachGreat question! Let me analyze the trade-offs of storing values in separate columns/fields vs. the current approach. Current Approach (Mode-Specific Single Field)How it works:
Pros:
Cons:
Proposed Approach (Dual-Field Storage)How it would work:
Pros:
Cons:
Recommendation: Stick with Current Approach ✅The current implementation is the right choice because:
Alternative: Migration ToolingInstead of dual-field storage, consider:
Bottom line: The current approach is architecturally sound. It's clean, efficient, and the mode validation prevents silent data corruption. The dual-field approach would introduce complexity, waste resources, and still require migration—without providing meaningful benefits. Would you like me to implement migration helper methods instead? |
|
@claude please address the feedback on the PR and when we do a get we should always accept either format, the store setting just controls what we put |
|
Claude finished @strawgate's task —— View job Completed: Making reads format-agnostic ✅Successfully implemented the requested changes to make read operations accept both storage formats. Changes MadeElasticsearch Store (
MongoDB Store (
Key Behavior ChangesBefore:
After:
Verification
Commit: |
- Elasticsearch and MongoDB stores now accept both native and JSON string formats on read - The native_storage flag now only controls write format, not read format - This enables gradual migration between storage modes - Supports mixed-mode data in the same collection Co-authored-by: William Easton <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
132-134: Use the declared character set for collection names.
_sanitize_collection_nameignoresCOLLECTION_ALLOWED_CHARACTERSand disallows underscores.- return sanitize_string(value=collection, max_length=MAX_COLLECTION_LENGTH, allowed_characters=ALPHANUMERIC_CHARACTERS) + return sanitize_string(value=collection, max_length=MAX_COLLECTION_LENGTH, allowed_characters=COLLECTION_ALLOWED_CHARACTERS)key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
386-404: Prefer dropping the index when deleting a collection.You create a dedicated index per collection; deleting documents leaves empty indices behind.
Apply:
- async def _delete_collection(self, *, collection: str) -> bool: - result: ObjectApiResponse[Any] = await self._client.options(ignore_status=404).delete_by_query( - index=self._sanitize_index_name(collection=collection), - body={ - "query": { - "term": { - "collection": collection, - }, - }, - }, - ) - - body: dict[str, Any] = get_body_from_response(response=result) - - if not (deleted := body.get("deleted")) or not isinstance(deleted, int): - return False - - return deleted > 0 + async def _delete_collection(self, *, collection: str) -> bool: + index = self._sanitize_index_name(collection=collection) + resp = await self._client.options(ignore_status=404).indices.delete(index=index) + # Treat 200 and 404 (already gone) as success + return bool(resp and resp.get("acknowledged", True))
♻️ Duplicate comments (2)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
241-242: Don't silently swallow storage-mode mismatches for value type.Returning None hides data problems; raise a clear error instead. (Reiterating prior feedback.)
- # Unexpected type or None - return None + # Unexpected type or None + got = type(value).__name__ + raise TypeError( + f"Data for key '{key}' has invalid value type: expected dict or str, got {got}." + )key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (1)
261-262: Don't silently swallow storage-mode mismatches for value type.Return of None hides data issues. Raise a clear error. (Reiterating prior feedback.)
- # Unexpected type - return None + # Unexpected type + got = type(value).__name__ + raise TypeError(f"Invalid 'value' type for key '{key}': expected dict or str, got {got}.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(8 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(10 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (3)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
get(111-137)key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(14-84)from_json(62-84)to_json(47-59)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
now(19-21)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/compound.py (1)
compound_key(32-34)key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(14-84)from_json(62-84)to_json(47-59)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
try_parse_datetime_str(34-41)
| _ = await new_collection.create_index(keys="key") | ||
|
|
||
| # Create TTL index for automatic expiration (only when using native storage) | ||
| if self._native_storage: | ||
| _ = await new_collection.create_index(keys="expires_at", expireAfterSeconds=0) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Enforce uniqueness of keys; validate it for existing collections.
Without a unique index on "key", duplicate keys can be created, making find_one nondeterministic.
Apply:
- _ = await new_collection.create_index(keys="key")
+ _ = await new_collection.create_index(keys="key", unique=True)And validate on existing collections:
@@
- indexes: list[dict[str, Any]] = await coll.list_indexes().to_list(length=None) # type: ignore[attr-defined]
+ indexes: list[dict[str, Any]] = await coll.list_indexes().to_list(length=None) # type: ignore[attr-defined]
+ has_unique_key: bool = any(
+ cast("dict[str, Any]", idx.get("key", {})).get("key") is not None and idx.get("unique") is True # type: ignore[union-attr]
+ for idx in indexes # type: ignore[misc]
+ )
+ if not has_unique_key:
+ raise ValueError(f"Collection '{collection}' is missing a unique index on 'key'.")If legacy duplicates exist, index creation will fail; consider a one-time migration to dedupe before enabling uniqueness.
Also applies to: 159-165
I'll verify the review comment by examining the codebase and validating the claims made.
🏁 Script executed:
#!/bin/bash
# View the file and surrounding context for lines 151-165
cat "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py" | head -200 | tail -80Length of output: 4052
🏁 Script executed:
#!/bin/bash
# Check the full function context starting from a broader range
sed -n '100,180p' "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py"Length of output: 3788
🏁 Script executed:
#!/bin/bash
# Search for how MongoDB indexes are used elsewhere in the file
rg -n "create_index|list_indexes" "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py" -B2 -A2Length of output: 812
🏁 Script executed:
#!/bin/bash
# Check dependencies and MongoDB driver version
fd -e "requirements.txt" -o -e "pyproject.toml" -o -e "setup.py" | head -5 | xargs catLength of output: 116
🏁 Script executed:
#!/bin/bash
# Look for how collections are initialized and validated
rg -n "find_one|duplicate|unique" "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py" -B1 -A1Length of output: 351
🏁 Script executed:
#!/bin/bash
# Get function signature and docstring for context
sed -n '120,160p' "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py"Length of output: 1856
🏁 Script executed:
#!/bin/bash
# Check imports at the top of the file to understand the MongoDB driver
head -50 "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py"Length of output: 1836
🏁 Script executed:
#!/bin/bash
# Look for validation logic on existing collections
rg -n "list_indexes\|existing\|migrate" "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py" -B3 -A5Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for any existing duplicate key handling
rg -n "find_one" "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py" -B2 -A2Length of output: 227
🏁 Script executed:
#!/bin/bash
# Check if there's any initialization or setup code that handles collections
rg -n "__init__\|connect\|setup" "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py" -B1 -A10Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look at the full file structure
wc -l "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py"Length of output: 69
🏁 Script executed:
#!/bin/bash
# Get a broader view of the entire file
cat "key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py"Length of output: 13268
Add unique constraint to key index and validate on existing collections.
The code uses find_one(filter={"key": key}) to retrieve entries, which becomes nondeterministic if duplicate keys exist. Without a unique index, concurrent operations could create duplicates, violating the key-value store contract.
Apply to line 151:
- _ = await new_collection.create_index(keys="key")
+ _ = await new_collection.create_index(keys="key", unique=True)Add validation in _validate_collection_indexes() after line 164 to check existing collections:
indexes: list[dict[str, Any]] = await coll.list_indexes().to_list(length=None) # type: ignore[attr-defined]
+ # Check for unique index on key
+ has_unique_key: bool = any(
+ cast("dict[str, Any]", idx.get("key", {})).get("key") is not None and idx.get("unique") is True # type: ignore[union-attr]
+ for idx in indexes # type: ignore[misc]
+ )
+ if not has_unique_key:
+ raise ValueError(f"Collection '{collection}' is missing a unique index on 'key'.")
+
# Check for TTL index on expires_atIf legacy collections have duplicate keys, index creation will fail. Plan a one-time migration to deduplicate before deployment.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py around
lines 151 to 156, the "key" index is created without a uniqueness constraint
which allows duplicate keys and breaks deterministic lookups; change the index
creation to create a unique index on "key" (unique=True) and ensure any existing
non-unique index is replaced or upgraded safely. In the same file after line 164
inside _validate_collection_indexes(), add a validation step that scans each
existing collection for duplicate "key" values (e.g., aggregate/group by key
counting >1) and, if duplicates are found, raise a clear error instructing a
one-time deduplication migration before proceeding; if no duplicates exist,
proceed to create the unique index or verify the index options match uniqueness
and raise if incompatible. Ensure the validation differentiates native vs
non-native storage and surfaces actionable messaging about performing the
migration when index creation would fail.
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py
Outdated
Show resolved
Hide resolved
|
@claude please review the PR Feedback and incorporate if necessary and then do a line by line review of the PR to get it ready to merge |
|
Claude finished @strawgate's task —— View job PR Review Complete ✅All CodeRabbit feedback has been addressed and I've performed a comprehensive line-by-line review of the changes. SummaryThis PR successfully implements optional native storage for MongoDB and Elasticsearch stores, enabling users to store values as native documents/objects instead of JSON strings. The implementation is solid and well-designed with proper validation, backward compatibility, and clear migration warnings. ✅ Strengths
Changes Made (Commit
|
- Elasticsearch: Add doc_values validation, timezone normalization, better error handling - MongoDB: Add unique index constraint, improve TTL validation, timezone normalization, better error handling - Fix collection name sanitization to use correct character set - Raise TypeError instead of returning None for invalid value types Co-authored-by: William Easton <[email protected]>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (1)
86-119: Consider adding tz_aware=True to AsyncMongoClient initialization.While the code correctly normalizes datetimes in the read path (lines 248-251), initializing the MongoDB client with
tz_aware=Trueprovides defense in depth and ensures consistency across all datetime operations.Apply this diff:
elif url: - self._client = AsyncMongoClient(url) + self._client = AsyncMongoClient(url, tz_aware=True) else: # Defaults to localhost - self._client = AsyncMongoClient() + self._client = AsyncMongoClient(tz_aware=True)
♻️ Duplicate comments (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
296-322: Optional: write a storage-mode marker to ease debugging/migrations.Add a small
storage_modefield to each document and assert it on reads later if desired. Repeats prior suggestion.Apply:
@@ - document = { + document = { "collection": collection, "key": key, "value": managed_entry.value, # Store as flattened object + "storage_mode": "native", } @@ - document = { + document = { "collection": collection, "key": key, "value": managed_entry.to_json(), # Store as JSON string + "storage_mode": "json", }
305-321: Optional: normalize timestamp serialization to milliseconds.Use ISO with millisecond precision to reduce payload variance and keep ES-friendly dates; cull still uses epoch millis and remains compatible. Repeats a prior nit.
Apply:
- if managed_entry.created_at: - document["created_at"] = managed_entry.created_at.isoformat() - if managed_entry.expires_at: - document["expires_at"] = managed_entry.expires_at.isoformat() + if managed_entry.created_at: + document["created_at"] = managed_entry.created_at.isoformat(timespec="milliseconds") + if managed_entry.expires_at: + document["expires_at"] = managed_entry.expires_at.isoformat(timespec="milliseconds") @@ - if managed_entry.created_at: - document["created_at"] = managed_entry.created_at.isoformat() - if managed_entry.expires_at: - document["expires_at"] = managed_entry.expires_at.isoformat() + if managed_entry.created_at: + document["created_at"] = managed_entry.created_at.isoformat(timespec="milliseconds") + if managed_entry.expires_at: + document["expires_at"] = managed_entry.expires_at.isoformat(timespec="milliseconds")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py(8 hunks)key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py(11 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use Ruff for Python formatting and linting (configured in pyproject.toml)
Python line length is 140 characters
Use Basedpyright in strict mode for static type checking
Files:
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.pykey-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
key-value/key-value-aio/src/key_value/aio/stores/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
key-value/key-value-aio/src/key_value/aio/stores/**/*.py: Store implementations must persist ManagedEntry wrappers (not raw values) with value, creation timestamp, and TTL metadata
Async store implementations must implement the AsyncKeyValue protocol (key_value/aio/protocols/key_value.py)
Files:
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.pykey-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py
🧬 Code graph analysis (2)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (3)
key-value/key-value-aio/src/key_value/aio/stores/base.py (1)
get(111-137)key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(14-84)from_json(62-84)to_json(47-59)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
now(19-21)
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (3)
key-value/key-value-shared/src/key_value/shared/utils/compound.py (1)
compound_key(32-34)key-value/key-value-shared/src/key_value/shared/utils/managed_entry.py (3)
ManagedEntry(14-84)from_json(62-84)to_json(47-59)key-value/key-value-shared/src/key_value/shared/utils/time_to_live.py (1)
try_parse_datetime_str(34-41)
🔇 Additional comments (9)
key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py (7)
1-2: LGTM: Imports are correct.The addition of
timezonesupports UTC normalization for datetime objects retrieved from MongoDB, addressing the tz-aware datetime requirement.
33-33: LGTM: Collection character constraint is appropriate.The
COLLECTION_ALLOWED_CHARACTERSconstant properly restricts collection names to alphanumeric and underscore characters, aligning with MongoDB naming conventions.
42-42: LGTM: Native storage flag is properly defined.The
_native_storageboolean attribute correctly tracks the storage mode for the instance.
136-158: LGTM: Collection setup correctly implements index requirements.The implementation properly:
- Creates a unique index on the
keyfield to prevent duplicates- Conditionally creates a TTL index only in native storage mode
- Validates existing collections for correct index configuration
160-211: LGTM: Index validation is comprehensive and correct.The validation method properly:
- Checks for a unique index on the
keyfield- Validates TTL index with
expireAfterSeconds=0in native mode- Detects mismatches between storage mode and index configuration
- Provides actionable error messages with remediation steps
- Gracefully handles non-validation exceptions to allow store operation during transient issues
222-267: LGTM: Read path correctly handles both storage formats.The implementation properly:
- Accepts both native BSON (dict) and JSON string formats on read
- Validates datetime types with informative error messages
- Normalizes datetimes to UTC-aware to prevent comparison errors
- Raises clear TypeErrors for unexpected value types
This addresses all past review comments about type validation, explicit errors, and timezone handling.
279-320: LGTM: Write path correctly implements dual storage modes.The implementation properly:
- Stores native BSON documents with datetime objects in native mode
- Stores JSON strings in legacy mode
- Uses
$setOnInsertfor immutable fields (created_at)- Uses
$setfor mutable fields (value, updated_at, expires_at)- Uses
$unsetto remove expiration whenexpires_atis None- Sets
updated_atconsistently in both modesThis addresses all past review comments about metadata consistency, field redundancy, and expiration removal.
key-value/key-value-aio/src/key_value/aio/stores/elasticsearch/store.py (2)
145-166: Mode-driven mapping looks good.Creating flattened vs keyword and disabling doc_values in JSON mode is correct. No issues spotted.
246-281: Dual-format read path is solid.Handles native dict and JSON-string correctly; explicit TypeError on unexpected types is good.
| async def _validate_index_mapping(self, *, index_name: str, collection: str) -> None: | ||
| """Validate that the index mapping matches the configured storage mode.""" | ||
| try: | ||
| mapping_response = await self._client.indices.get_mapping(index=index_name) | ||
| mappings = mapping_response.get(index_name, {}).get("mappings", {}) | ||
| props = mappings.get("properties", {}) | ||
| value_field = props.get("value", {}) | ||
| value_field_type = value_field.get("type") | ||
| created_type = props.get("created_at", {}).get("type") | ||
| expires_type = props.get("expires_at", {}).get("type") | ||
|
|
||
| expected_type = "flattened" if self._native_storage else "keyword" | ||
|
|
||
| if value_field_type != expected_type: | ||
| msg = ( | ||
| f"Index mapping mismatch for collection '{collection}': " | ||
| f"index has 'value' field type '{value_field_type}', " | ||
| f"but store is configured for '{expected_type}' (native_storage={self._native_storage}). " | ||
| f"To fix this, either: 1) Use the correct storage mode when initializing the store, " | ||
| f"or 2) Delete and recreate the index with the new mapping." | ||
| ) | ||
| raise ValueError(msg) # noqa: TRY301 | ||
|
|
||
| # In JSON mode ensure doc_values is disabled | ||
| if not self._native_storage and value_field.get("doc_values", True) is not False: | ||
| msg = ( | ||
| f"Index mapping mismatch for collection '{collection}': " | ||
| f"'value.doc_values' should be False in JSON mode to avoid columnar storage bloat." | ||
| ) | ||
| raise ValueError(msg) # noqa: TRY301 | ||
|
|
||
| # Enforce date types for timestamps (both modes) | ||
| for field_name, field_type in (("created_at", created_type), ("expires_at", expires_type)): | ||
| if field_type not in ("date", None): # None => not yet created; will be added on first write | ||
| msg = ( | ||
| f"Index mapping mismatch for collection '{collection}': " | ||
| f"'{field_name}' is mapped as '{field_type}', expected 'date'. " | ||
| f"Delete and recreate the index or fix the mapping." | ||
| ) | ||
| raise ValueError(msg) # noqa: TRY301 | ||
| except ValueError: | ||
| raise | ||
| except Exception: | ||
| # Log a warning but do not fail hard (keep behavior) | ||
| logging.getLogger(__name__).warning( | ||
| "Failed to validate mapping for index '%s' (collection '%s')", index_name, collection, exc_info=True | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Harden mapping validation (also enforce value.index == False and key/collection types).
Add two checks:
- In JSON mode ensure
value.indexis False (currently onlydoc_valuesis validated). - Ensure
collectionandkeyarekeyword.
Apply:
@@
- value_field = props.get("value", {})
+ value_field = props.get("value", {})
value_field_type = value_field.get("type")
+ key_field_type = props.get("key", {}).get("type")
+ collection_field_type = props.get("collection", {}).get("type")
@@
if not self._native_storage and value_field.get("doc_values", True) is not False:
msg = (
f"Index mapping mismatch for collection '{collection}': "
f"'value.doc_values' should be False in JSON mode to avoid columnar storage bloat."
)
raise ValueError(msg) # noqa: TRY301
+
+ # In JSON mode ensure value is not indexed
+ if not self._native_storage and value_field.get("index", True) is not False:
+ raise ValueError(
+ f"Index mapping mismatch for collection '{collection}': 'value.index' must be False in JSON mode."
+ )
+
+ # Ensure routing fields are keyword-typed
+ if key_field_type not in ("keyword", None) or collection_field_type not in ("keyword", None):
+ raise ValueError(
+ f"Index mapping mismatch for collection '{collection}': 'key' and 'collection' must be 'keyword'."
+ )| # Try to read as JSON string format | ||
| if isinstance(value, str): | ||
| # JSON string format: parse the JSON string | ||
| return ManagedEntry.from_json(json_str=value) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Optional: tolerate legacy JSON-without-metadata strings.
If old data stored raw JSON values (no wrapper), fall back to includes_metadata=False on parse failure.
Apply:
- if isinstance(value, str):
- # JSON string format: parse the JSON string
- return ManagedEntry.from_json(json_str=value)
+ if isinstance(value, str):
+ # JSON string format: prefer metadata wrapper; fall back to raw value if absent
+ try:
+ return ManagedEntry.from_json(json_str=value)
+ except Exception: # e.g., DeserializationError
+ return ManagedEntry.from_json(json_str=value, includes_metadata=False)If you prefer stricter control, catch DeserializationError explicitly and import it from key_value.shared.utils.managed_entry.
| if created_at is not None and type(created_at) is not datetime: | ||
| msg = ( | ||
| f"Data for key '{key}' has invalid created_at type: expected datetime but got {type(created_at).__name__}. " | ||
| f"This may indicate a storage mode mismatch." | ||
| ) | ||
| raise TypeError(msg) | ||
| if expires_at is not None and type(expires_at) is not datetime: | ||
| msg = ( | ||
| f"Data for key '{key}' has invalid expires_at type: expected datetime but got {type(expires_at).__name__}. " | ||
| f"This may indicate a storage mode mismatch." | ||
| ) | ||
| raise TypeError(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider using isinstance for datetime type validation.
Lines 233 and 239 use type(created_at) is not datetime which rejects datetime subclasses. While this is stricter, isinstance is more idiomatic and would accept valid datetime subclasses.
Apply this diff if datetime subclasses should be accepted:
- if created_at is not None and type(created_at) is not datetime:
+ if created_at is not None and not isinstance(created_at, datetime):
msg = (
f"Data for key '{key}' has invalid created_at type: expected datetime but got {type(created_at).__name__}. "
f"This may indicate a storage mode mismatch."
)
raise TypeError(msg)
- if expires_at is not None and type(expires_at) is not datetime:
+ if expires_at is not None and not isinstance(expires_at, datetime):
msg = (
f"Data for key '{key}' has invalid expires_at type: expected datetime but got {type(expires_at).__name__}. "
f"This may indicate a storage mode mismatch."
)
raise TypeError(msg)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if created_at is not None and type(created_at) is not datetime: | |
| msg = ( | |
| f"Data for key '{key}' has invalid created_at type: expected datetime but got {type(created_at).__name__}. " | |
| f"This may indicate a storage mode mismatch." | |
| ) | |
| raise TypeError(msg) | |
| if expires_at is not None and type(expires_at) is not datetime: | |
| msg = ( | |
| f"Data for key '{key}' has invalid expires_at type: expected datetime but got {type(expires_at).__name__}. " | |
| f"This may indicate a storage mode mismatch." | |
| ) | |
| raise TypeError(msg) | |
| if created_at is not None and not isinstance(created_at, datetime): | |
| msg = ( | |
| f"Data for key '{key}' has invalid created_at type: expected datetime but got {type(created_at).__name__}. " | |
| f"This may indicate a storage mode mismatch." | |
| ) | |
| raise TypeError(msg) | |
| if expires_at is not None and not isinstance(expires_at, datetime): | |
| msg = ( | |
| f"Data for key '{key}' has invalid expires_at type: expected datetime but got {type(expires_at).__name__}. " | |
| f"This may indicate a storage mode mismatch." | |
| ) | |
| raise TypeError(msg) |
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py around
lines 233 to 244, the code uses strict type checks "type(... ) is not datetime"
for created_at and expires_at which rejects datetime subclasses; replace those
checks with isinstance(created_at, datetime) and isinstance(expires_at,
datetime) (negated as needed) so subclasses are accepted, and keep the existing
error messages and raises intact.
| if managed_entry.created_at: | ||
| set_on_insert_fields["created_at"] = managed_entry.created_at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider explicit None check for created_at.
Line 289 uses if managed_entry.created_at: which would skip setting created_at if it's any falsy value. While this is unlikely to cause issues in practice, using an explicit is not None check would be more defensive and consistent with the expires_at handling on line 298.
Apply this diff for consistency:
- if managed_entry.created_at:
+ if managed_entry.created_at is not None:
set_on_insert_fields["created_at"] = managed_entry.created_at📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if managed_entry.created_at: | |
| set_on_insert_fields["created_at"] = managed_entry.created_at | |
| if managed_entry.created_at is not None: | |
| set_on_insert_fields["created_at"] = managed_entry.created_at |
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/mongodb/store.py around
lines 289-290, change the conditional that currently uses truthiness for
managed_entry.created_at to an explicit None check; replace "if
managed_entry.created_at:" with "if managed_entry.created_at is not None:" so
falsy but valid timestamps (e.g., 0) are preserved and to match the expires_at
handling for consistency.
|
@claude we've made some pretty huge changes to the project since we started this PR can you bring this up to speed with our new way of doing things |
|
Claude finished @strawgate's task —— View job Analysis Complete: Branch Requires Fresh StartAfter analyzing the situation, I've determined that this branch has diverged too significantly from main (34 commits with massive conflicts across almost every file). The best approach is to start fresh. AnalysisThe main branch has undergone major refactoring since this PR started:
This PR's native storage feature is still valuable, but needs to be reimplemented on top of the current main branch. RecommendationI recommend closing this PR and creating a new one with a fresh implementation that:
The implementation will be much simpler using the new helper methods - essentially just:
Next StepsWould you like me to:
|



Summary
Implements the recommendations from #90 to use native document storage for MongoDB and Elasticsearch instead of JSON strings.
MongoDB Changes
Elasticsearch Changes
Breaking Change
Related
Closes #90
Generated with Claude Code
Summary by CodeRabbit
New Features
Chores