Skip to content

Conversation

@Dhravya
Copy link
Member

@Dhravya Dhravya commented Jan 16, 2026

No description provided.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 16, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
supermemory-app 9465f57 Commit Preview URL Jan 16 2026, 08:13 PM

Copy link
Member Author

Dhravya commented Jan 16, 2026


How to use the Graphite Merge Queue

Add the label Main to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Dhravya Dhravya marked this pull request as ready for review January 16, 2026 20:09
@Dhravya Dhravya requested a review from Prasanna721 January 16, 2026 20:09
@claude
Copy link

claude bot commented Jan 16, 2026

Code Review - Gemini Live Pipecat Integration

Summary

This PR adds support for speech-to-speech models (specifically Gemini Live) to the pipecat-sdk-python package by introducing transcription frame handling. The implementation looks solid overall, with a few considerations below.


✅ Positives

  1. Good separation of concerns: New _handle_transcription and _inject_memories_for_speech_to_speech methods cleanly handle the speech-to-speech flow
  2. Code reuse: Leverages existing deduplicate_memories and format_memories_to_text utilities effectively
  3. Error handling: Properly catches MemoryRetrievalError and logs warnings without breaking the pipeline
  4. Consistent patterns: The transcription handling mirrors the existing context frame handling logic

⚠️ Issues & Recommendations

1. Code Duplication (Medium Priority)

The _inject_memories_for_speech_to_speech method (lines 330-375) duplicates significant logic from _enhance_context_with_memories (lines 239-296):

  • Same deduplication logic
  • Same memory counting
  • Same mode-based filtering
  • Nearly identical memory formatting

Recommendation: Extract shared logic into a private method like _format_memory_content that both methods can call:

def _format_memory_content(self, memories_data: Dict[str, Any]) -> Optional[str]:
    """Format memory data into text content."""
    profile = memories_data["profile"]
    
    deduplicated = deduplicate_memories(
        static=profile["static"],
        dynamic=profile["dynamic"],
        search_results=memories_data["search_results"],
    )
    
    total_memories = (
        len(deduplicated["static"])
        + len(deduplicated["dynamic"])
        + len(deduplicated["search_results"])
    )
    
    if total_memories == 0:
        return None
    
    include_profile = self.params.mode in ("profile", "full")
    include_search = self.params.mode in ("query", "full")
    
    memory_text = format_memories_to_text(
        deduplicated,
        system_prompt=self.params.system_prompt,
        include_static=include_profile,
        include_dynamic=include_profile,
        include_search=include_search,
    )
    
    return memory_text, total_memories

2. Fire-and-Forget Task (Low Priority - Current Pattern)

Line 328: asyncio.create_task(self._store_messages([user_message])) creates an untracked background task.

Current behavior: This matches the existing pattern in line 424, so it's consistent with the codebase
Note: Tasks might be orphaned if the event loop closes, but this appears to be the intended design pattern for non-blocking memory storage

3. _last_query Shared State (Low Priority - Design Question)

The _last_query deduplication check is shared between transcription handling and context enhancement. This means:

  • A transcription frame sets _last_query
  • A subsequent context frame with the same text will be skipped (or vice versa)

Question: Is this the intended behavior? If transcriptions and context frames flow through the pipeline separately for the same user input, you might want separate tracking variables:

  • _last_transcription_query
  • _last_context_query

4. Missing Tests (High Priority - Per Instructions)

The packages/pipecat-sdk-python/tests/ directory only contains an empty __init__.py. Since this is a published package, it should have test coverage for:

  • ✅ Transcription frame handling
  • ✅ Memory injection for speech-to-speech models
  • ✅ Deduplication logic with shared _last_query state
  • ✅ Error handling when memory retrieval fails

Recommendation: Add pytest tests before merging, especially for the new transcription flow.

5. Version Bump & Lock File (Minor - Good Practice)

  • ✅ Version bumped from 0.1.0 to 0.1.1 in pyproject.toml
  • uv.lock added for reproducible builds (2190 lines, mostly dependencies)

No issues here, but consider documenting what changed in a CHANGELOG if this is a published package.


🔒 Security Check

  • ✅ No injection vulnerabilities introduced
  • ✅ No credential exposure
  • ✅ Input validation: transcription text is stripped and checked for empty values
  • ✅ Error handling doesn't leak sensitive information

🎯 Breaking Changes

None - this is a backward-compatible addition.


Recommendation

Request Changes - Address the code duplication (#1) and add tests (#4) before merging. The other issues are minor and can be addressed in follow-ups if needed.


cc @Dhravya - Nice work on the Gemini Live integration! The approach is solid. Let me know if you want me to submit a follow-up PR for the refactoring suggestion.

static=profile["static"],
dynamic=profile["dynamic"],
search_results=memories_data["search_results"],
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code uses dictionary-style access r["memory"] on search result objects, which expect attribute access like r.content. This will cause a TypeError.
Severity: CRITICAL

Suggested Fix

Change the dictionary-style access r["memory"] to attribute-style access. Based on usage in documentation and other parts of the codebase, the correct form is likely r.content. The line should be updated to search_results = [r.content for r in response.search_results.results].

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: packages/pipecat-sdk-python/src/supermemory_pipecat/service.py#L344

Potential issue: The list comprehension `[r["memory"] for r in
response.search_results.results]` uses dictionary-style indexing to access memory
content from search results. However, documentation and usage examples across the
codebase show that the search results are objects, not dictionaries. The correct way to
access the content is via an attribute, such as `r.content` or `r.memory`. This mismatch
will cause a `TypeError` whenever the memory retrieval process finds and attempts to
process search results, crashing the operation.

Did we get this right? 👍 / 👎 to inform future reviews.

@claude
Copy link

claude bot commented Jan 16, 2026

PR Review: Gemini Live Speech-to-Speech Support

Summary

This PR adds support for speech-to-speech models (specifically Gemini Live) by handling TranscriptionFrame events in the pipecat integration. Version bumped from 0.1.0 to 0.1.1.


Code Quality ✅

Good:

  • Clean separation of concerns with new methods _handle_transcription() and _inject_memories_for_speech_to_speech()
  • Proper error handling with try/except blocks
  • Good logging for debugging
  • Follows existing code patterns in the module

Issues:
None significant - code follows Python conventions and Pydantic patterns consistently.


Security 🔒

Good:

  • No new security vulnerabilities introduced
  • Uses existing safe API methods (_retrieve_memories, _store_messages)
  • Text is properly stripped before processing
  • Fire-and-forget task pattern for storage (asyncio.create_task) is appropriate here

Logic & Implementation 🤔

Issues Found:

  1. Duplicate processing risk (service.py:395-398):
    The TranscriptionFrame handler calls _handle_transcription() and then immediately passes the frame downstream. If the transcription eventually generates an LLMMessagesFrame or LLMContextFrame, the same user message could be processed twice - once for transcription and once for the context frame.

    Consider tracking processed transcriptions or ensuring the downstream frames don't duplicate the retrieval.

  2. Missing import in original file (service.py:16):
    The diff shows new imports added (TranscriptionFrame, LLMMessagesUpdateFrame), but I notice the original file at line 16 only imports:

    from pipecat.frames.frames import Frame, LLMContextFrame, LLMMessagesFrame

    Verify these new frame types are available in your pipecat version to avoid import errors at runtime.

  3. Empty transcription handling (service.py:304-306):
    Early return for empty text is good, but frame.text.strip() could be None if frame.text is None. The check if not frame.text or not frame.text.strip() is safe but could use a comment explaining why we skip empty/whitespace-only transcriptions.


Performance 🚀

Good:

  • Async/await properly used throughout
  • Deduplication logic reused from existing deduplicate_memories()
  • Early returns prevent unnecessary processing

Minor concern:

  • The _last_query deduplication only checks if the current text equals the previous one. For speech-to-speech, you might get incremental transcriptions or repeated phrases that aren't identical but still represent the same query. Consider using a similarity threshold or time-based deduplication.

Test Coverage 🧪

Issue:
This package is listed as a published package with pytest test setup according to the repo structure, but:

  • The tests/ directory only contains an empty __init__.py
  • No tests exist for this new feature
  • No tests exist for existing functionality either

Recommendation:
Since this is a published Python package (supermemory-pipecat), it should have test coverage. Consider adding:

  • Unit tests for _handle_transcription() mocking the frame and memory retrieval
  • Unit tests for _inject_memories_for_speech_to_speech()
  • Integration test with mock TranscriptionFrame to verify the full flow

Breaking Changes 📦

None - This is additive functionality. Adding version 0.1.0 → 0.1.1 is appropriate for a minor feature addition.

However, verify the uv.lock file (2190 lines added) is intentional. Lock files should generally be in .gitignore for libraries (kept for applications only).


Verdict

Approve with minor concerns

The implementation is solid, but address:

  1. Potential duplicate processing of transcriptions
  2. Add test coverage for the new functionality
  3. Consider more robust deduplication for speech patterns
  4. Verify uv.lock should be committed for a library package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants