trying to make gemini live work with pipecat #674

Dhravya · 2026-01-16T20:08:48Z

No description provided.

cloudflare-workers-and-pages · 2026-01-16T20:08:56Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	supermemory-app	`9465f57`	Commit Preview URL	Jan 16 2026, 08:13 PM

Dhravya · 2026-01-16T20:09:05Z

trying to make gemini live work with pipecat #674 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add the label Main to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2026-01-16T20:10:17Z

Code Review - Gemini Live Pipecat Integration

Summary

This PR adds support for speech-to-speech models (specifically Gemini Live) to the pipecat-sdk-python package by introducing transcription frame handling. The implementation looks solid overall, with a few considerations below.

✅ Positives

Good separation of concerns: New _handle_transcription and _inject_memories_for_speech_to_speech methods cleanly handle the speech-to-speech flow
Code reuse: Leverages existing deduplicate_memories and format_memories_to_text utilities effectively
Error handling: Properly catches MemoryRetrievalError and logs warnings without breaking the pipeline
Consistent patterns: The transcription handling mirrors the existing context frame handling logic

⚠️ Issues & Recommendations

1. Code Duplication (Medium Priority)

The _inject_memories_for_speech_to_speech method (lines 330-375) duplicates significant logic from _enhance_context_with_memories (lines 239-296):

Same deduplication logic
Same memory counting
Same mode-based filtering
Nearly identical memory formatting

Recommendation: Extract shared logic into a private method like _format_memory_content that both methods can call:

def _format_memory_content(self, memories_data: Dict[str, Any]) -> Optional[str]:
    """Format memory data into text content."""
    profile = memories_data["profile"]
    
    deduplicated = deduplicate_memories(
        static=profile["static"],
        dynamic=profile["dynamic"],
        search_results=memories_data["search_results"],
    )
    
    total_memories = (
        len(deduplicated["static"])
        + len(deduplicated["dynamic"])
        + len(deduplicated["search_results"])
    )
    
    if total_memories == 0:
        return None
    
    include_profile = self.params.mode in ("profile", "full")
    include_search = self.params.mode in ("query", "full")
    
    memory_text = format_memories_to_text(
        deduplicated,
        system_prompt=self.params.system_prompt,
        include_static=include_profile,
        include_dynamic=include_profile,
        include_search=include_search,
    )
    
    return memory_text, total_memories

2. Fire-and-Forget Task (Low Priority - Current Pattern)

Line 328: asyncio.create_task(self._store_messages([user_message])) creates an untracked background task.

Current behavior: This matches the existing pattern in line 424, so it's consistent with the codebase
Note: Tasks might be orphaned if the event loop closes, but this appears to be the intended design pattern for non-blocking memory storage

3. `_last_query` Shared State (Low Priority - Design Question)

The _last_query deduplication check is shared between transcription handling and context enhancement. This means:

A transcription frame sets _last_query
A subsequent context frame with the same text will be skipped (or vice versa)

Question: Is this the intended behavior? If transcriptions and context frames flow through the pipeline separately for the same user input, you might want separate tracking variables:

_last_transcription_query
_last_context_query

4. Missing Tests (High Priority - Per Instructions)

The packages/pipecat-sdk-python/tests/ directory only contains an empty __init__.py. Since this is a published package, it should have test coverage for:

✅ Transcription frame handling
✅ Memory injection for speech-to-speech models
✅ Deduplication logic with shared _last_query state
✅ Error handling when memory retrieval fails

Recommendation: Add pytest tests before merging, especially for the new transcription flow.

5. Version Bump & Lock File (Minor - Good Practice)

✅ Version bumped from 0.1.0 to 0.1.1 in pyproject.toml
✅ uv.lock added for reproducible builds (2190 lines, mostly dependencies)

No issues here, but consider documenting what changed in a CHANGELOG if this is a published package.

🔒 Security Check

✅ No injection vulnerabilities introduced
✅ No credential exposure
✅ Input validation: transcription text is stripped and checked for empty values
✅ Error handling doesn't leak sensitive information

🎯 Breaking Changes

None - this is a backward-compatible addition.

Recommendation

Request Changes - Address the code duplication (#1) and add tests (#4) before merging. The other issues are minor and can be addressed in follow-ups if needed.

cc @Dhravya - Nice work on the Gemini Live integration! The approach is solid. Let me know if you want me to submit a follow-up PR for the refactoring suggestion.

sentry · 2026-01-16T20:12:37Z

packages/pipecat-sdk-python/src/supermemory_pipecat/service.py

+            static=profile["static"],
+            dynamic=profile["dynamic"],
+            search_results=memories_data["search_results"],
+        )


Bug: The code uses dictionary-style access r["memory"] on search result objects, which expect attribute access like r.content. This will cause a TypeError.
_{Severity: CRITICAL}

Suggested Fix

Change the dictionary-style access r["memory"] to attribute-style access. Based on usage in documentation and other parts of the codebase, the correct form is likely r.content. The line should be updated to search_results = [r.content for r in response.search_results.results].

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: packages/pipecat-sdk-python/src/supermemory_pipecat/service.py#L344 Potential issue: The list comprehension `[r["memory"] for r in response.search_results.results]` uses dictionary-style indexing to access memory content from search results. However, documentation and usage examples across the codebase show that the search results are objects, not dictionaries. The correct way to access the content is via an attribute, such as `r.content` or `r.memory`. This mismatch will cause a `TypeError` whenever the memory retrieval process finds and attempts to process search results, crashing the operation.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

claude · 2026-01-16T20:54:18Z

PR Review: Gemini Live Speech-to-Speech Support

Summary

This PR adds support for speech-to-speech models (specifically Gemini Live) by handling TranscriptionFrame events in the pipecat integration. Version bumped from 0.1.0 to 0.1.1.

Code Quality ✅

Good:

Clean separation of concerns with new methods _handle_transcription() and _inject_memories_for_speech_to_speech()
Proper error handling with try/except blocks
Good logging for debugging
Follows existing code patterns in the module

Issues:
None significant - code follows Python conventions and Pydantic patterns consistently.

Security 🔒

Good:

No new security vulnerabilities introduced
Uses existing safe API methods (_retrieve_memories, _store_messages)
Text is properly stripped before processing
Fire-and-forget task pattern for storage (asyncio.create_task) is appropriate here

Logic & Implementation 🤔

Issues Found:

Duplicate processing risk (service.py:395-398):
The TranscriptionFrame handler calls _handle_transcription() and then immediately passes the frame downstream. If the transcription eventually generates an LLMMessagesFrame or LLMContextFrame, the same user message could be processed twice - once for transcription and once for the context frame.

Consider tracking processed transcriptions or ensuring the downstream frames don't duplicate the retrieval.
Missing import in original file (service.py:16):
The diff shows new imports added (TranscriptionFrame, LLMMessagesUpdateFrame), but I notice the original file at line 16 only imports:
```
from pipecat.frames.frames import Frame, LLMContextFrame, LLMMessagesFrame
```
Verify these new frame types are available in your pipecat version to avoid import errors at runtime.
Empty transcription handling (service.py:304-306):
Early return for empty text is good, but frame.text.strip() could be None if frame.text is None. The check if not frame.text or not frame.text.strip() is safe but could use a comment explaining why we skip empty/whitespace-only transcriptions.

Performance 🚀

Good:

Async/await properly used throughout
Deduplication logic reused from existing deduplicate_memories()
Early returns prevent unnecessary processing

Minor concern:

The _last_query deduplication only checks if the current text equals the previous one. For speech-to-speech, you might get incremental transcriptions or repeated phrases that aren't identical but still represent the same query. Consider using a similarity threshold or time-based deduplication.

Test Coverage 🧪

Issue:
This package is listed as a published package with pytest test setup according to the repo structure, but:

The tests/ directory only contains an empty __init__.py
No tests exist for this new feature
No tests exist for existing functionality either

Recommendation:
Since this is a published Python package (supermemory-pipecat), it should have test coverage. Consider adding:

Unit tests for _handle_transcription() mocking the frame and memory retrieval
Unit tests for _inject_memories_for_speech_to_speech()
Integration test with mock TranscriptionFrame to verify the full flow

Breaking Changes 📦

None - This is additive functionality. Adding version 0.1.0 → 0.1.1 is appropriate for a minor feature addition.

However, verify the uv.lock file (2190 lines added) is intentional. Lock files should generally be in .gitignore for libraries (kept for applications only).

Verdict

Approve with minor concerns ✅

The implementation is solid, but address:

Potential duplicate processing of transcriptions
Add test coverage for the new functionality
Consider more robust deduplication for speech patterns
Verify uv.lock should be committed for a library package

trying to make gemini live work with pipecat

9465f57

Dhravya marked this pull request as ready for review January 16, 2026 20:09

graphite-app bot assigned Dhravya Jan 16, 2026

Dhravya requested a review from Prasanna721 January 16, 2026 20:09

sentry bot reviewed Jan 16, 2026

View reviewed changes

Prasanna721 closed this Jan 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

trying to make gemini live work with pipecat #674

trying to make gemini live work with pipecat #674

Dhravya commented Jan 16, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

Dhravya commented Jan 16, 2026

Uh oh!

claude bot commented Jan 16, 2026

Uh oh!

sentry bot Jan 16, 2026

Uh oh!

claude bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

trying to make gemini live work with pipecat #674

trying to make gemini live work with pipecat #674

Conversation

Dhravya commented Jan 16, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

Dhravya commented Jan 16, 2026

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Jan 16, 2026

Code Review - Gemini Live Pipecat Integration

Summary

✅ Positives

⚠️ Issues & Recommendations

1. Code Duplication (Medium Priority)

2. Fire-and-Forget Task (Low Priority - Current Pattern)

3. _last_query Shared State (Low Priority - Design Question)

4. Missing Tests (High Priority - Per Instructions)

5. Version Bump & Lock File (Minor - Good Practice)

🔒 Security Check

🎯 Breaking Changes

Recommendation

Uh oh!

sentry bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 16, 2026

PR Review: Gemini Live Speech-to-Speech Support

Summary

Code Quality ✅

Security 🔒

Logic & Implementation 🤔

Performance 🚀

Test Coverage 🧪

Breaking Changes 📦

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloudflare-workers-and-pages bot commented Jan 16, 2026 •

edited

Loading

3. `_last_query` Shared State (Low Priority - Design Question)