Skip to content

Conversation

@mehtarac
Copy link
Owner

@mehtarac mehtarac commented Oct 24, 2025

Summary

This PR introduces a clean agent-driven architecture for bidirectional streaming, transforming complex manual PyAudio coordination into simple, reusable patterns. The architecture separates core agent logic from hardware IO channels, with the agent as the primary interface managing IO channel lifecycle and enabling extensible multi-modal capabilities.

BidirectionalIO Interface:

Protocol:

class BidirectionalIO:
    async def input_channel(self) -> dict
        # Read input data from the IO channel source
    
    async def output_channel(self, event: dict) -> None
        # Process output event from the model through the IO channel
        
    def cleanup(self) -> None
        # Clean up IO channel resources

Audio Implementation:

class AudioIO(BidirectionalIO):
    def __init__(audio_config: Optional[dict] = None) -> None
        # Configuration via dict: sample_rates, chunk_size, devices, channels
    
    async def input_channel(self) -> dict
        # Read audio from microphone
    
    async def output_channel(self, event: dict) -> None
        # Handle audio events with direct stream writing
        
    def cleanup(self) -> None
        # Clean up audio resources

Purpose:

Pure hardware abstraction layer - bridges PyAudio (microphone/speakers) ↔ BidirectionalAgent

Before: (200+ lines)

# Required 4 separate async functions with shared context
async def play(context): # ~60 lines - PyAudio speaker + queue processing  
async def record(context): # ~30 lines - PyAudio microphone setup
async def receive(agent, context): # ~40 lines - Event handling
async def send(agent, context): # ~20 lines - Audio sending

# Complex manual coordination
context = {"active": True, "audio_in": Queue(), "audio_out": Queue(), "interrupted": False}
await asyncio.gather(play(context), record(context), receive(agent, context), send(agent, context))

After: (2-3 lines)

Primary Pattern: Agent-Driven with IO Channels (Recommended)

# IO channels passed to run method
audio = AudioIO(audio_config={"input_sample_rate": 16000})
agent = BidirectionalAgent(model=model, tools=[calculator])
await agent.run(io_channels=[audio])  # IO channels passed to run method

Context Manager Pattern: Guaranteed Cleanup

# Automatic resource management with context managers
audio = AudioIO(audio_config={"input_sample_rate": 16000})
async with BidirectionalAgent(model=model, tools=[calculator]) as agent:
    await agent.run(io_channels=[audio])

Custom Transport Pattern: Maximum Flexibility

# Direct transport control when needed
agent = BidirectionalAgent(model=model, tools=[calculator])
await agent.run(io_channels=[(sender_function, receiver_function)])

Agent-Driven Design:

  • BidirectionalAgent is the primary interface users interact with
  • AudioIO is pure hardware abstraction implementing BidirectionalIO protocol
  • IO channels passed to run() method, not constructor

API Simplifications:

  • BidirectionalIO Protocol - Clean interface with input_channel() and output_channel() methods
  • Type aliases - BidirectionalInput for cleaner signatures with modern | syntax
  • IO channel independence - AudioIO no longer depends on agent instance
  • Flexible run method - Accepts list of BidirectionalIO channels or transport tuples

Future Roadmap

This agent-driven architecture with BidirectionalIO abstraction enables:

  • VideoIO for real-time video processing
  • WebSocketIO for remote clients
  • SensorIO for IoT and sensor data
  • Custom transport layers via the standardized BidirectionalIO protocol

await stop_bidirectional_connection(self._session)
self._session = None

def _validate_active_session(self) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we build this into Session? Then in the agent init, we can do something like:

self._session = Session()

And in agent start, we can do:

self._session.start()

From here, session itself can raise a "not started" error if we try to use a member method that requires activation (e.g., send_interrupt).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would we be able to rename to something like self._loop_connection to avoid overloading session.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this will work -- I am going to leave this comment open to do this change once we merge in the agent class, and the bidirectional event loop so that this trivial change can be made on top. Currently both of the PRs are open and the loops are in different state.

@github-actions github-actions bot added size/l and removed size/l labels Nov 5, 2025
@github-actions github-actions bot added the size/l label Nov 6, 2025
@mkmeral
Copy link

mkmeral commented Nov 6, 2025

For interface. First, I'll assume we wont have audio config as part of the interface, as it's related to audio adapter.

For methods, given that we have a class, do we need a method that returns callable (e.g. create_input). Why don't we have just have send/receive methods as part of the interface

async with BidirectionalAgent(model=model, tools=[calculator]) as agent:
print("New BidirectionalAgent Experience")
print("Try asking: 'What is 25 times 8?' or 'Calculate the square root of 144'")
await agent.connect()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but it's difficult for me to really understand what consumers will really experience. Would be great to see more full-fledged examples to truly understand the devex more than a simple toy example:

  • Integrated with a client UI
  • One component as part of a larger system (e.g. in-vehicle assistant)
  • Integrated with non-bidi agents
  • Real-time interrupts
  • Listening for activation phrases - "Hey Alexa"

- Remove adapter from constructor
- Implement BidirectionlIO interface
- Add adapter the run() method
@@ -1,244 +1,136 @@
"""Bidirectional Agent for real-time streaming conversations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think Bidi is an acceptable shorthand and so I would suggest every package, class, method, and variable use that naming convention. Examples:

  • src/strands/experimental/bidi/agent/agent.py
  • class BidiAgent

The benefit here is that customers have less to type. I also notice this is a convention used out in the wild. Examples:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side quest: Would you be able to come up with a distinguishing name for normal agents? I have been calling them converse stream agents so I can talk "bidi agent" vs "converse stream agent". It would be nice to have something shorter though. You can then share the name with the team so going forward it is easier for everyone to talk about.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like "bidi" shorthand in the repo for customer facing things. That said, I'd like to avoid bidirectional name in most of the stuff, unless it'll be used together with other agents/packages, and it requires some clarification. I think this only applies to bidi agent for now.

For the rest, they are under bidi namespace, and if customers want to have it more explanatory they can just use import aliases

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to have the prefix since customers are going to be importing the components top-level:

from strands.experimental.bidi.models import GeminiBidiModel

# vs

from strands.experimental.bidirectional.models import GeminiModel
  • The second import is longer
  • The second import has a less descriptive name that'll be used in the customer code.
  • In Python, it is conventional for people to use shorthand names (e.g., import numpy as np). Here we just go ahead shorten on the customer's behalf.
  • bidi is a commonly accepted shorthand for bidirectional.

For these reasons, I feel pretty strongly about utilizing the Bidi/bidi prefix.

Unified method for sending text, audio, and image input to the model during
an active conversation session.
logger.debug("Conversation start - initializing connection")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would recommend having an LLM adjust the formatting of the logs to follow the guidelines outlined in https://github.com/strands-agents/sdk-python/blob/main/STYLE_GUIDE.md.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will apply these changes in a follow-up PR

from ..event_loop.bidirectional_event_loop import start_bidirectional_connection, stop_bidirectional_connection
from ....types.tools import ToolResult, ToolUse, AgentTool

from ..event_loop.bidirectional_event_loop import BidirectionalAgentLoop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and make BidiAgentLoop private (i.e., call it _BidiAgentLoop).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 It might be nice to see what we have private vs public in all bidi code

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this in a follow-up commit in the PR here: #10

@@ -1,244 +1,136 @@
"""Bidirectional Agent for real-time streaming conversations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like "bidi" shorthand in the repo for customer facing things. That said, I'd like to avoid bidirectional name in most of the stuff, unless it'll be used together with other agents/packages, and it requires some clarification. I think this only applies to bidi agent for now.

For the rest, they are under bidi namespace, and if customers want to have it more explanatory they can just use import aliases

from ..event_loop.bidirectional_event_loop import start_bidirectional_connection, stop_bidirectional_connection
from ....types.tools import ToolResult, ToolUse, AgentTool

from ..event_loop.bidirectional_event_loop import BidirectionalAgentLoop
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 It might be nice to see what we have private vs public in all bidi code

"InterruptionDetectedEvent",
"BidirectionalStreamEvent",
"VoiceActivityEvent",
"UsageMetricsEvent",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say let's not expose the Events top level. I think for now it is good to encourage users to import from the subpackage to avoid confusion. We have both stream events and hook events and so it'll be good to help users distinguish. Also, I think it will be good to be consistent with what we expose top level for the uni agent

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was a discussion I had with @zastrowm

I think we want to start exposing these events. There are a couple of reasons. First TS consistency, as they will also do that.

Also it allows us to expose more complicated events that can be used in different places. For example, error event was a problem, because if you want to make the event (they were dicts) serializable, then you cannot include exception object, but using the typed dicts, we can have our cake and eat it too :D

The class below is technically still a dict (so can be send to websockets, and used with json.dumps, etc), but it also gives customers the ability to access exceptions directly, if they want to raise a new exception from this one. Essentially exposing events allows us to do this.

Additionally, we can start to expose the typed dicts in agent and multi-agent as they'd be backwards compatible: they extend dicts, they are dicts

class ErrorEvent(TypedEvent):
    """Error occurred during the session.

    Stores the full Exception object as an instance attribute for debugging while
    keeping the event dict JSON-serializable. The exception can be accessed via
    the `error` property for re-raising or type-based error handling.

    Parameters:
        error: The exception that occurred.
        details: Optional additional error information.
    """

    def __init__(
        self,
        error: Exception,
        details: Optional[Dict[str, Any]] = None,
    ):
        # Store serializable data in dict (for JSON serialization)
        super().__init__(
            {
                "type": "bidirectional_error",
                "message": str(error),
                "code": type(error).__name__,
                "details": details,
            }
        )
        # Store exception as instance attribute (not serialized)
        self._error = error

    @property
    def error(self) -> Exception:
        """The original exception that occurred.
        
        Can be used for re-raising or type-based error handling.
        """
        return self._error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things to take into consideration:

  • In exposing typed-events, I would expect all events coming out of BiDi to be typed, not a subset. Is that the case?
  • For consistency, UniDi events should also be typed, not just BiDi. Open question whether that's at the same time or as a follow-up
  • If we start publishing typed events, the entire class + shape should be bar-raised, not just the dict emitted. In the past we haven't cared about the concrete members (or class naming, or init members) because they weren't public apis - if we're making them public they should be bar-raised

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In exposing typed-events, I would expect all events coming out of BiDi to be typed, not a subset. Is that the case?

It should be the case, I will double check

For consistency, UniDi events should also be typed, not just BiDi. Open question whether that's at the same time or as a follow-up

Yes, I agree. I'd prefer a followup, not to overburden ourselves right before re:invent. That said, we should also do a poc to make sure such change won't break existing customers. I'd expect so, but it's better to make sure

If we start publishing typed events, the entire class + shape should be bar-raised, not just the dict emitted. In the past we haven't cared about the concrete members (or class naming, or init members) because they weren't public apis - if we're making them public they should be bar-raised

That makes sense. I will work on a doc for it to show what are the events/types we have

@@ -1,8 +1,11 @@
"""Bidirectional streaming package."""

# Main components - Primary user interface
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up: Per discussion, we are Rachit is going to apply the Bidi/bidi prefix in a follow PR.

@@ -1,244 +1,136 @@
"""Bidirectional Agent for real-time streaming conversations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to have the prefix since customers are going to be importing the components top-level:

from strands.experimental.bidi.models import GeminiBidiModel

# vs

from strands.experimental.bidirectional.models import GeminiModel
  • The second import is longer
  • The second import has a less descriptive name that'll be used in the customer code.
  • In Python, it is conventional for people to use shorthand names (e.g., import numpy as np). Here we just go ahead shorten on the customer's behalf.
  • bidi is a commonly accepted shorthand for bidirectional.

For these reasons, I feel pretty strongly about utilizing the Bidi/bidi prefix.

model: BidirectionalModel,
tools: list | None = None,
model: BidirectionalModel| str | None = None,
tools: list[str| AgentTool| ToolProvider]| None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: list[...]| vs list[...] |

Args:
sender: Async callable that sends events to the client (e.g., websocket.send_json).
receiver: Async callable that receives events from the client (e.g., websocket.receive_json).
async def run(self, io_channels: list[BidiIO | tuple[Callable, Callable]]) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up: Let's try to define a more specific Callable type for the tuple.

if not io_channels:
raise ValueError("io_channels parameter cannot be empty. Provide either an IO channel or (sender, receiver) tuple.")

transport = io_channels[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do a for loop correct? Also, let's use io or io_channel every where in place of transport and adapter.


async def _run_with_transport(
self,
transport: BidiIO | tuple[Callable, Callable],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's convert the tuple to a BidiIO so that we don't need to build conditions around the type in this method.

- input_channels (int): Input channels (default: 1)
- output_channels (int): Output channels (default: 1)
"""
if pyaudio is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we don't need this check anymore since we import pyaudio top-level.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right -- will remove

@mehtarac mehtarac mentioned this pull request Nov 10, 2025
@mehtarac mehtarac merged commit fd11282 into main Nov 10, 2025
1 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants