feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) #680

Prasanna721 · 2026-01-18T03:17:51Z

Added native speech-to-speech model support

Summary:

Speech-to-speech support - Auto-detect audio frames and inject memories to system prompt for native audio models (Gemini Live, etc.)
Fix memory bloating - Replace memories each turn using XML tags instead of accumulating
Add temporal context - Show recency on search results ([2d ago], [15 Jan])
New inject_mode param - auto (default), system, or user

Docs update

Update the docs for native speech-2-speech models

Summary: - Speech-to-speech support - Auto-detect audio frames and inject memories to system prompt for native audio models (Gemini Live, etc.) - Fix memory bloating - Replace memories each turn using XML tags instead of accumulating - Add temporal context - Show recency on search results ([2d ago], [15 Jan]) - New inject_mode param - auto (default), system, or user

cloudflare-workers-and-pages · 2026-01-18T03:17:58Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	supermemory-app	`dcb95c2`	Commit Preview URL Branch Preview URL	Jan 18 2026, 03:22 AM

Prasanna721 · 2026-01-18T03:18:09Z

feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) #680 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add the label Main to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-01-18T03:19:23Z

PR Review: Speech-to-Speech Support for Pipecat SDK

Summary

This PR adds intelligent speech-to-speech model support (e.g., Gemini Live) to the Pipecat SDK with three key improvements:

Auto-detection of audio frames → injects memories to system prompt for native audio models
Memory replacement strategy → uses XML tags to replace memories each turn (prevents bloating)
Temporal context → shows recency on search results ([2d ago], [15 Jan])

Code Quality ✅

Strengths:

Clean type annotations with proper Literal types for modes
Good separation of concerns (service.py, utils.py)
Proper async/await patterns with fire-and-forget storage
Comprehensive docstrings
Graceful error handling with try/except and logging

Minor Issues:

Python 3.10+ union syntax (line 7 in utils.py): str | None is used, which requires Python 3.10+. This is fine since pyproject.toml specifies requires-python = ">=3.10", but worth noting for compatibility.

Security Review ✅

No security issues found. Good patterns:

API keys properly handled via env vars with fallback
JSON serialization for user content (no injection risk)
Input validation via Pydantic models
No unsafe credential exposure

Performance Review ✅

Good performance patterns:

Fire-and-forget message storage (asyncio.create_task) prevents blocking
Smart caching with _last_query to avoid duplicate processing
Efficient regex replacement with MEMORY_TAG_PATTERN
Early returns when no memories found

Logic & Correctness Review

Critical Issue 🚨 - Potential Bug:

In service.py:273-278, the audio frame detection has a subtle issue:

if isinstance(frame, InputAudioRawFrame):
    if not self._audio_frames_detected:
        self._audio_frames_detected = True
    await self.push_frame(frame, direction)
    return

Problem: Once _audio_frames_detected is set to True, it never gets reset during the same session except via explicit reset_memory_tracking() call. This means:

If a user switches from speech-to-speech to text mode mid-session, the SDK will still inject to system prompt
The inject_mode="auto" detection becomes permanent after first audio frame

Suggested Fix:
Consider resetting _audio_frames_detected when you detect a change in frame types, or track frame types over a sliding window to handle mode switches dynamically.

Minor Issue - Edge Case:

In service.py:261-265, the user message memory removal logic:

for i in range(len(messages) - 1, -1, -1):
    msg = messages[i]
    if msg.get("role") == "user" and MEMORY_TAG_START in msg.get("content", ""):
        messages.pop(i)
        break

Problem: If a user legitimately includes <user_memories> in their message content (unlikely but possible), it would be incorrectly removed.

Suggested Fix: Use the full MEMORY_TAG_PATTERN regex match instead of just checking for MEMORY_TAG_START.

Temporal Context Feature Review ✅

The format_relative_time function (utils.py:15-49) is well-implemented:

Handles timezone-aware timestamps correctly
Graceful fallback on parse errors
Good bucketing strategy for recency display

Platform compatibility note: The strftime('%-d') format (line 45-47) uses a Unix-specific format. On Windows, this will fail. Consider using .day instead:

return f"{dt.day} {dt.strftime('%b')}"

Test Coverage 📋

According to the instructions, this package requires pytest tests. However, the tests/ directory only contains __init__.py with no actual test files.

Recommendation: Add tests for:

format_relative_time edge cases (timezone handling, invalid timestamps)
deduplicate_memories logic (priority ordering)
Memory injection modes (auto/system/user)
Audio frame detection behavior

Documentation Review ✅

The documentation updates in apps/docs/voice-realtime/pipecat.mdx are excellent:

Clear injection mode table
Good examples for Gemini Live
Proper import path fixes

Breaking Changes ⚠️

API Changes:

New inject_mode parameter added to InputParams (default: "auto")
This is backward compatible since it has a default value

Behavior Changes:

Search results structure changed from List[str] to List[Dict[str, Any]]
Internal deduplication logic now expects dict format
This could be a breaking change if consumers were directly accessing the raw search results

Summary Verdict

Overall: Good PR with one critical bug to fix

✅ Approve after fixing:

Critical: Audio frame detection persistence issue (could cause wrong injection mode)
Nice-to-have: Platform-specific strftime format
Nice-to-have: More robust memory tag removal logic

The core features are well-designed and the code quality is high. The temporal context and memory replacement strategy are excellent improvements. Just needs the audio detection logic refinement before merging.

Recommended Actions:

Fix the _audio_frames_detected persistence issue
Add basic pytest tests for the new features
Consider Windows strftime compatibility in format_relative_time

packages/pipecat-sdk-python/src/supermemory_pipecat/service.py

sentry · 2026-01-18T03:22:49Z

packages/pipecat-sdk-python/src/supermemory_pipecat/utils.py

-    """Format deduplicated memories into a text string for injection."""
+    """Format deduplicated memories into a text string for injection.
+
+    Search results include temporal context (e.g., '3d ago') from updatedAt.
+    """
    sections = []

    static = memories["static"]


Bug: The format_relative_time function uses the %-d strftime directive, which is not supported on Windows, causing date formatting to fail silently and omit temporal context.
_{Severity: HIGH}

Suggested Fix

Replace the platform-specific %-d directive with a cross-platform compatible method. One approach is to use dt.day to get the day as an integer and then format it into the string, for example: dt.strftime(f"{dt.day} %b").

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: packages/pipecat-sdk-python/src/supermemory_pipecat/utils.py#L44-L47 Potential issue: The `format_relative_time` function uses the `%-d` strftime directive to format dates. This directive is a POSIX-specific extension and is not supported on Windows, which uses `%#d` instead. When this code runs on a Windows machine, `strftime` will raise a `ValueError`. A broad `except Exception` block catches this error and returns an empty string, causing the temporal context (e.g., `[15 Jan]`) to be silently omitted from memory search results for Windows users. This degrades the feature without any warning.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

packages/pipecat-sdk-python/src/supermemory_pipecat/service.py

mintlify bot deployed to staging - apps/docs January 18, 2026 03:18 View deployment

Prasanna721 self-assigned this Jan 18, 2026

Prasanna721 requested a review from Dhravya January 18, 2026 03:18

Prasanna721 marked this pull request as ready for review January 18, 2026 03:19

sentry bot reviewed Jan 18, 2026

View reviewed changes

packages/pipecat-sdk-python/src/supermemory_pipecat/service.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) #680

feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) #680

Uh oh!

Prasanna721 commented Jan 18, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Jan 18, 2026 •

edited

Loading

Uh oh!

Prasanna721 commented Jan 18, 2026

Uh oh!

claude bot commented Jan 18, 2026

Uh oh!

Uh oh!

sentry bot Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) #680

Are you sure you want to change the base?

feat(pipecat-sdk): add speech-to-speech model support (Gemini Live) #680

Uh oh!

Conversation

Prasanna721 commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added native speech-to-speech model support

Summary:

Docs update

Uh oh!

cloudflare-workers-and-pages bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Prasanna721 commented Jan 18, 2026

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Jan 18, 2026

PR Review: Speech-to-Speech Support for Pipecat SDK

Summary

Code Quality ✅

Security Review ✅

Performance Review ✅

Logic & Correctness Review

Temporal Context Feature Review ✅

Test Coverage 📋

Documentation Review ✅

Breaking Changes ⚠️

Summary Verdict

Uh oh!

Uh oh!

sentry bot Jan 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prasanna721 commented Jan 18, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Jan 18, 2026 •

edited

Loading