Skip to content

Conversation

@Prasanna721
Copy link
Contributor

@Prasanna721 Prasanna721 commented Jan 18, 2026

Added native speech-to-speech model support

Summary:

  • Speech-to-speech support - Auto-detect audio frames and inject memories to system prompt for native audio models (Gemini Live, etc.)
  • Fix memory bloating - Replace memories each turn using XML tags instead of accumulating
  • Add temporal context - Show recency on search results ([2d ago], [15 Jan])
  • New inject_mode param - auto (default), system, or user

Docs update

  • Update the docs for native speech-2-speech models

Summary:
  - Speech-to-speech support - Auto-detect audio frames and inject memories to system prompt for native audio models (Gemini Live, etc.)
  - Fix memory bloating - Replace memories each turn using XML tags instead of accumulating
  - Add temporal context - Show recency on search results ([2d ago], [15 Jan])
  - New inject_mode param - auto (default), system, or user
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 18, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
supermemory-app dcb95c2 Commit Preview URL

Branch Preview URL
Jan 18 2026, 03:22 AM

Copy link
Contributor Author


How to use the Graphite Merge Queue

Add the label Main to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Jan 18, 2026

PR Review: Speech-to-Speech Support for Pipecat SDK

Summary

This PR adds intelligent speech-to-speech model support (e.g., Gemini Live) to the Pipecat SDK with three key improvements:

  1. Auto-detection of audio frames → injects memories to system prompt for native audio models
  2. Memory replacement strategy → uses XML tags to replace memories each turn (prevents bloating)
  3. Temporal context → shows recency on search results ([2d ago], [15 Jan])

Code Quality ✅

Strengths:

  • Clean type annotations with proper Literal types for modes
  • Good separation of concerns (service.py, utils.py)
  • Proper async/await patterns with fire-and-forget storage
  • Comprehensive docstrings
  • Graceful error handling with try/except and logging

Minor Issues:

  1. Python 3.10+ union syntax (line 7 in utils.py): str | None is used, which requires Python 3.10+. This is fine since pyproject.toml specifies requires-python = ">=3.10", but worth noting for compatibility.

Security Review ✅

No security issues found. Good patterns:

  • API keys properly handled via env vars with fallback
  • JSON serialization for user content (no injection risk)
  • Input validation via Pydantic models
  • No unsafe credential exposure

Performance Review ✅

Good performance patterns:

  • Fire-and-forget message storage (asyncio.create_task) prevents blocking
  • Smart caching with _last_query to avoid duplicate processing
  • Efficient regex replacement with MEMORY_TAG_PATTERN
  • Early returns when no memories found

Logic & Correctness Review

Critical Issue 🚨 - Potential Bug:

In service.py:273-278, the audio frame detection has a subtle issue:

if isinstance(frame, InputAudioRawFrame):
    if not self._audio_frames_detected:
        self._audio_frames_detected = True
    await self.push_frame(frame, direction)
    return

Problem: Once _audio_frames_detected is set to True, it never gets reset during the same session except via explicit reset_memory_tracking() call. This means:

  • If a user switches from speech-to-speech to text mode mid-session, the SDK will still inject to system prompt
  • The inject_mode="auto" detection becomes permanent after first audio frame

Suggested Fix:
Consider resetting _audio_frames_detected when you detect a change in frame types, or track frame types over a sliding window to handle mode switches dynamically.


Minor Issue - Edge Case:

In service.py:261-265, the user message memory removal logic:

for i in range(len(messages) - 1, -1, -1):
    msg = messages[i]
    if msg.get("role") == "user" and MEMORY_TAG_START in msg.get("content", ""):
        messages.pop(i)
        break

Problem: If a user legitimately includes <user_memories> in their message content (unlikely but possible), it would be incorrectly removed.

Suggested Fix: Use the full MEMORY_TAG_PATTERN regex match instead of just checking for MEMORY_TAG_START.


Temporal Context Feature Review ✅

The format_relative_time function (utils.py:15-49) is well-implemented:

  • Handles timezone-aware timestamps correctly
  • Graceful fallback on parse errors
  • Good bucketing strategy for recency display

Platform compatibility note: The strftime('%-d') format (line 45-47) uses a Unix-specific format. On Windows, this will fail. Consider using .day instead:

return f"{dt.day} {dt.strftime('%b')}"

Test Coverage 📋

According to the instructions, this package requires pytest tests. However, the tests/ directory only contains __init__.py with no actual test files.

Recommendation: Add tests for:

  1. format_relative_time edge cases (timezone handling, invalid timestamps)
  2. deduplicate_memories logic (priority ordering)
  3. Memory injection modes (auto/system/user)
  4. Audio frame detection behavior

Documentation Review ✅

The documentation updates in apps/docs/voice-realtime/pipecat.mdx are excellent:

  • Clear injection mode table
  • Good examples for Gemini Live
  • Proper import path fixes

Breaking Changes ⚠️

API Changes:

  • New inject_mode parameter added to InputParams (default: "auto")
  • This is backward compatible since it has a default value

Behavior Changes:

  • Search results structure changed from List[str] to List[Dict[str, Any]]
  • Internal deduplication logic now expects dict format
  • This could be a breaking change if consumers were directly accessing the raw search results

Summary Verdict

Overall: Good PR with one critical bug to fix

Approve after fixing:

  1. Critical: Audio frame detection persistence issue (could cause wrong injection mode)
  2. Nice-to-have: Platform-specific strftime format
  3. Nice-to-have: More robust memory tag removal logic

The core features are well-designed and the code quality is high. The temporal context and memory replacement strategy are excellent improvements. Just needs the audio detection logic refinement before merging.


Recommended Actions:

  1. Fix the _audio_frames_detected persistence issue
  2. Add basic pytest tests for the new features
  3. Consider Windows strftime compatibility in format_relative_time

@Prasanna721 Prasanna721 marked this pull request as ready for review January 18, 2026 03:19
Comment on lines -44 to 47
"""Format deduplicated memories into a text string for injection."""
"""Format deduplicated memories into a text string for injection.
Search results include temporal context (e.g., '3d ago') from updatedAt.
"""
sections = []

static = memories["static"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The format_relative_time function uses the %-d strftime directive, which is not supported on Windows, causing date formatting to fail silently and omit temporal context.
Severity: HIGH

Suggested Fix

Replace the platform-specific %-d directive with a cross-platform compatible method. One approach is to use dt.day to get the day as an integer and then format it into the string, for example: dt.strftime(f"{dt.day} %b").

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/pipecat-sdk-python/src/supermemory_pipecat/utils.py#L44-L47

Potential issue: The `format_relative_time` function uses the `%-d` strftime directive
to format dates. This directive is a POSIX-specific extension and is not supported on
Windows, which uses `%#d` instead. When this code runs on a Windows machine, `strftime`
will raise a `ValueError`. A broad `except Exception` block catches this error and
returns an empty string, causing the temporal context (e.g., `[15 Jan]`) to be silently
omitted from memory search results for Windows users. This degrades the feature without
any warning.

Did we get this right? 👍 / 👎 to inform future reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants