-
Notifications
You must be signed in to change notification settings - Fork 0
feat: (Agent): Finalize Bidirectional Agent class #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
| await stop_bidirectional_connection(self._session) | ||
| self._session = None | ||
|
|
||
| def _validate_active_session(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we build this into Session? Then in the agent init, we can do something like:
self._session = Session()And in agent start, we can do:
self._session.start()From here, session itself can raise a "not started" error if we try to use a member method that requires activation (e.g., send_interrupt).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, would we be able to rename to something like self._loop_connection to avoid overloading session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this will work -- I am going to leave this comment open to do this change once we merge in the agent class, and the bidirectional event loop so that this trivial change can be made on top. Currently both of the PRs are open and the loops are in different state.
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/tests/optimized_example.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/tests/optimized_example.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/tests/optimized_example.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/adapters/audio_adapter.py
Outdated
Show resolved
Hide resolved
…ntation. Will be added later when implementation is added.
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
|
For interface. First, I'll assume we wont have audio config as part of the interface, as it's related to audio adapter. For methods, given that we have a class, do we need a method that returns callable (e.g. |
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/tests/test_bidi.py
Outdated
Show resolved
Hide resolved
| async with BidirectionalAgent(model=model, tools=[calculator]) as agent: | ||
| print("New BidirectionalAgent Experience") | ||
| print("Try asking: 'What is 25 times 8?' or 'Calculate the square root of 144'") | ||
| await agent.connect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good but it's difficult for me to really understand what consumers will really experience. Would be great to see more full-fledged examples to truly understand the devex more than a simple toy example:
- Integrated with a client UI
- One component as part of a larger system (e.g. in-vehicle assistant)
- Integrated with non-bidi agents
- Real-time interrupts
- Listening for activation phrases - "Hey Alexa"
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
- Remove adapter from constructor - Implement BidirectionlIO interface - Add adapter the run() method
| @@ -1,244 +1,136 @@ | |||
| """Bidirectional Agent for real-time streaming conversations. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think Bidi is an acceptable shorthand and so I would suggest every package, class, method, and variable use that naming convention. Examples:
src/strands/experimental/bidi/agent/agent.pyclass BidiAgent
The benefit here is that customers have less to type. I also notice this is a convention used out in the wild. Examples:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side quest: Would you be able to come up with a distinguishing name for normal agents? I have been calling them converse stream agents so I can talk "bidi agent" vs "converse stream agent". It would be nice to have something shorter though. You can then share the name with the team so going forward it is easier for everyone to talk about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like "bidi" shorthand in the repo for customer facing things. That said, I'd like to avoid bidirectional name in most of the stuff, unless it'll be used together with other agents/packages, and it requires some clarification. I think this only applies to bidi agent for now.
For the rest, they are under bidi namespace, and if customers want to have it more explanatory they can just use import aliases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to have the prefix since customers are going to be importing the components top-level:
from strands.experimental.bidi.models import GeminiBidiModel
# vs
from strands.experimental.bidirectional.models import GeminiModel- The second import is longer
- The second import has a less descriptive name that'll be used in the customer code.
- In Python, it is conventional for people to use shorthand names (e.g.,
import numpy as np). Here we just go ahead shorten on the customer's behalf. bidiis a commonly accepted shorthand for bidirectional.
For these reasons, I feel pretty strongly about utilizing the Bidi/bidi prefix.
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
| Unified method for sending text, audio, and image input to the model during | ||
| an active conversation session. | ||
| logger.debug("Conversation start - initializing connection") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would recommend having an LLM adjust the formatting of the logs to follow the guidelines outlined in https://github.com/strands-agents/sdk-python/blob/main/STYLE_GUIDE.md.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will apply these changes in a follow-up PR
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
| from ..event_loop.bidirectional_event_loop import start_bidirectional_connection, stop_bidirectional_connection | ||
| from ....types.tools import ToolResult, ToolUse, AgentTool | ||
|
|
||
| from ..event_loop.bidirectional_event_loop import BidirectionalAgentLoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go ahead and make BidiAgentLoop private (i.e., call it _BidiAgentLoop).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 It might be nice to see what we have private vs public in all bidi code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do this in a follow-up commit in the PR here: #10
src/strands/experimental/bidirectional_streaming/agent/agent.py
Outdated
Show resolved
Hide resolved
src/strands/experimental/bidirectional_streaming/types/audio_io.py
Outdated
Show resolved
Hide resolved
| @@ -1,244 +1,136 @@ | |||
| """Bidirectional Agent for real-time streaming conversations. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't like "bidi" shorthand in the repo for customer facing things. That said, I'd like to avoid bidirectional name in most of the stuff, unless it'll be used together with other agents/packages, and it requires some clarification. I think this only applies to bidi agent for now.
For the rest, they are under bidi namespace, and if customers want to have it more explanatory they can just use import aliases
| from ..event_loop.bidirectional_event_loop import start_bidirectional_connection, stop_bidirectional_connection | ||
| from ....types.tools import ToolResult, ToolUse, AgentTool | ||
|
|
||
| from ..event_loop.bidirectional_event_loop import BidirectionalAgentLoop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 It might be nice to see what we have private vs public in all bidi code
| "InterruptionDetectedEvent", | ||
| "BidirectionalStreamEvent", | ||
| "VoiceActivityEvent", | ||
| "UsageMetricsEvent", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say let's not expose the Events top level. I think for now it is good to encourage users to import from the subpackage to avoid confusion. We have both stream events and hook events and so it'll be good to help users distinguish. Also, I think it will be good to be consistent with what we expose top level for the uni agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this was a discussion I had with @zastrowm
I think we want to start exposing these events. There are a couple of reasons. First TS consistency, as they will also do that.
Also it allows us to expose more complicated events that can be used in different places. For example, error event was a problem, because if you want to make the event (they were dicts) serializable, then you cannot include exception object, but using the typed dicts, we can have our cake and eat it too :D
The class below is technically still a dict (so can be send to websockets, and used with json.dumps, etc), but it also gives customers the ability to access exceptions directly, if they want to raise a new exception from this one. Essentially exposing events allows us to do this.
Additionally, we can start to expose the typed dicts in agent and multi-agent as they'd be backwards compatible: they extend dicts, they are dicts
class ErrorEvent(TypedEvent):
"""Error occurred during the session.
Stores the full Exception object as an instance attribute for debugging while
keeping the event dict JSON-serializable. The exception can be accessed via
the `error` property for re-raising or type-based error handling.
Parameters:
error: The exception that occurred.
details: Optional additional error information.
"""
def __init__(
self,
error: Exception,
details: Optional[Dict[str, Any]] = None,
):
# Store serializable data in dict (for JSON serialization)
super().__init__(
{
"type": "bidirectional_error",
"message": str(error),
"code": type(error).__name__,
"details": details,
}
)
# Store exception as instance attribute (not serialized)
self._error = error
@property
def error(self) -> Exception:
"""The original exception that occurred.
Can be used for re-raising or type-based error handling.
"""
return self._error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things to take into consideration:
- In exposing typed-events, I would expect all events coming out of BiDi to be typed, not a subset. Is that the case?
- For consistency, UniDi events should also be typed, not just BiDi. Open question whether that's at the same time or as a follow-up
- If we start publishing typed events, the entire class + shape should be bar-raised, not just the dict emitted. In the past we haven't cared about the concrete members (or class naming, or init members) because they weren't public apis - if we're making them public they should be bar-raised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In exposing typed-events, I would expect all events coming out of BiDi to be typed, not a subset. Is that the case?
It should be the case, I will double check
For consistency, UniDi events should also be typed, not just BiDi. Open question whether that's at the same time or as a follow-up
Yes, I agree. I'd prefer a followup, not to overburden ourselves right before re:invent. That said, we should also do a poc to make sure such change won't break existing customers. I'd expect so, but it's better to make sure
If we start publishing typed events, the entire class + shape should be bar-raised, not just the dict emitted. In the past we haven't cared about the concrete members (or class naming, or init members) because they weren't public apis - if we're making them public they should be bar-raised
That makes sense. I will work on a doc for it to show what are the events/types we have
| @@ -1,8 +1,11 @@ | |||
| """Bidirectional streaming package.""" | |||
|
|
|||
| # Main components - Primary user interface | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up: Per discussion, we are Rachit is going to apply the Bidi/bidi prefix in a follow PR.
| @@ -1,244 +1,136 @@ | |||
| """Bidirectional Agent for real-time streaming conversations. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to have the prefix since customers are going to be importing the components top-level:
from strands.experimental.bidi.models import GeminiBidiModel
# vs
from strands.experimental.bidirectional.models import GeminiModel- The second import is longer
- The second import has a less descriptive name that'll be used in the customer code.
- In Python, it is conventional for people to use shorthand names (e.g.,
import numpy as np). Here we just go ahead shorten on the customer's behalf. bidiis a commonly accepted shorthand for bidirectional.
For these reasons, I feel pretty strongly about utilizing the Bidi/bidi prefix.
| model: BidirectionalModel, | ||
| tools: list | None = None, | ||
| model: BidirectionalModel| str | None = None, | ||
| tools: list[str| AgentTool| ToolProvider]| None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: list[...]| vs list[...] |
| Args: | ||
| sender: Async callable that sends events to the client (e.g., websocket.send_json). | ||
| receiver: Async callable that receives events from the client (e.g., websocket.receive_json). | ||
| async def run(self, io_channels: list[BidiIO | tuple[Callable, Callable]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up: Let's try to define a more specific Callable type for the tuple.
| if not io_channels: | ||
| raise ValueError("io_channels parameter cannot be empty. Provide either an IO channel or (sender, receiver) tuple.") | ||
|
|
||
| transport = io_channels[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do a for loop correct? Also, let's use io or io_channel every where in place of transport and adapter.
|
|
||
| async def _run_with_transport( | ||
| self, | ||
| transport: BidiIO | tuple[Callable, Callable], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's convert the tuple to a BidiIO so that we don't need to build conditions around the type in this method.
| - input_channels (int): Input channels (default: 1) | ||
| - output_channels (int): Output channels (default: 1) | ||
| """ | ||
| if pyaudio is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't need this check anymore since we import pyaudio top-level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right -- will remove
Summary
This PR introduces a clean agent-driven architecture for bidirectional streaming, transforming complex manual PyAudio coordination into simple, reusable patterns. The architecture separates core agent logic from hardware IO channels, with the agent as the primary interface managing IO channel lifecycle and enabling extensible multi-modal capabilities.
BidirectionalIO Interface:
Protocol:
Audio Implementation:
Purpose:
Pure hardware abstraction layer - bridges PyAudio (microphone/speakers) ↔ BidirectionalAgent
Before: (200+ lines)
After: (2-3 lines)
Primary Pattern: Agent-Driven with IO Channels (Recommended)
Context Manager Pattern: Guaranteed Cleanup
Custom Transport Pattern: Maximum Flexibility
Agent-Driven Design:
run()method, not constructorAPI Simplifications:
input_channel()andoutput_channel()methodsBidirectionalInputfor cleaner signatures with modern|syntaxFuture Roadmap
This agent-driven architecture with BidirectionalIO abstraction enables: