-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-5215 Add an asyncio.Protocol implementation for KMS #2460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I'm debugging two failures:
|
I realized the benchmark test wasn't actually triggering the protocol -> I'm tweaking things locally |
pymongo/network_layer.py
Outdated
|
||
async def async_sendall(conn: PyMongoProtocol, buf: bytes) -> None: | ||
bytes_needed = self._pending_reads.popleft() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we need more bytes than we have? We've already popped the waiter and set it's result to data
, which can only read up to self._bytes_ready
bytes. Are we relying on the kms_context.bytes_needed
loop to call the protocol read()
method again and create a new waiter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we give the partial result back to the kms context, and let it ask for more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could get better performance by doing more of the looping inside the Protocol, but KMS requests won't be a significant part of runtime anyway so not worth spending more time on it. Can you add a comment to this effect somewhere saying that we rely on the looping behavior for this to function correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really a question of perf, but the fact that the kms_request is blind until it knows the Content-Length, and we don't know what state it is in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment.
test/asynchronous/test_collection.py
Outdated
@@ -335,6 +335,8 @@ async def test_create_index(self): | |||
await db.test.create_index(["hello", ("world", DESCENDING)]) | |||
await db.test.create_index({"hello": 1}.items()) # type:ignore[arg-type] | |||
|
|||
# TODO: PYTHON-5491 - remove version max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it was, I just updated this branch.
pymongo/network_layer.py
Outdated
# Reuse the active buffer if it has space. | ||
if len(self._buffers): | ||
buffer = self._buffers[-1] | ||
if len(buffer.buffer) - buffer.end_index > sizehint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're setting sizehint
to be at least 16384
always, is this check worth doing in the first place? I'd expect us to rarely reuse the active buffer since we'll usually have a buffer of size 16384
and a sizehint of 16384
.
The actual sizehint in practice was on the order of the bytes being read from the buffer (typically less than 1000). Using the buffered protocol at all here is a bit of a mismatch imho. |
How long would refactoring to not use buffered take? No reason to use the lower-level API if we don't need to. |
It's actually dead simple, I did it along the way when I was debugging a race condition. |
I'll push a commit in the morning for comparison, we can always revert. |
I'm happy with the simplification. The tests are passing locally, this is ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you schedule a full Evergreen run? We should ensure there's no regressions introduced here by accident.
Are the benchmark results for KMS significantly different between the two Protocol implementations?
Full patch build: https://spruce.mongodb.com/version/689f5483e112170007b0ce9f/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC I updated the timings in the PR description, there no significant change. |
Okay there is one legit bug in |
Here's a new patch build, failures are existing flakiness issues or tracked in PYTHON-5502. |
@@ -124,7 +124,89 @@ def _set_non_inheritable_non_atomic(fd: int) -> None: # noqa: ARG001 | |||
_IS_SYNC = True | |||
|
|||
|
|||
class Connection: | |||
class BaseConnection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't utilize any of the BaseConnection
abstractions for the sync API, correct? The change is purely for compatibility with the async API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is used by both sync and async KMS, it is what gets returned by the _connect_kms
helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that we don't have a separate KMS networking interface for sync. Just wanted to confirm my understanding.
I had to fix merge conflicts |
See benchmark gist.
Benchmark Results:
Before:
4.93s, 5.26s
After:
4.93s
,5.05s
Depends on mongodb-labs/drivers-evergreen-tools#679