Skip to content

Conversation

@nikochiko
Copy link
Member

Q/A checklist

  • I have tested my UI changes on mobile and they look acceptable
  • I have tested changes to the workflows in both the API and the UI
  • I have done a code review of my changes and looked at each line of the diff + the references of each function I have changed
  • My changes have not increased the import time of the server
How to check import time?

time python -c 'import server'

You can visualize this using tuna:

python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

@coderabbitai
Copy link

coderabbitai bot commented Sep 1, 2025

📝 Walkthrough

Walkthrough

  • Added AsrModels.whisper_akera_large_v3 ("Akera Whisper v3 (akera)").
  • Added mapping asr_model_ids[AsrModels.whisper_akera_large_v3] → "akera/whisper-large-v3-kik-full_v2".
  • Added forced_asr_languages[AsrModels.whisper_akera_large_v3] → "kik".
  • Added asr_supported_languages[AsrModels.whisper_akera_large_v3] → {"kik"}.
  • Modified run_asr defaults for self-hosted models to set kwargs["task"] = "translate" when speech_translation_target is set, otherwise "transcribe".
  • Added Akera-specific handling: if selected model name contains "akera", remove "task" from kwargs and set max_length = 448.
  • Updated Whisper input construction to omit the task parameter for Akera models.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • devxpy
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch akera-asr

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
daras_ai_v2/asr.py (1)

367-368: Forced language = kik: confirm code system.

We use ISO-639-3 here (kik). Elsewhere some models use ISO-639-1 (ki). langcodes should normalize either, but mixing systems can confuse consumers.

Apply if you prefer 2-letter consistency:

-    AsrModels.whisper_akera_large_v3: "kik",
+    AsrModels.whisper_akera_large_v3: "ki",

If Akera specifically expects kik, keep as-is and add a short comment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0d84f63 and f4a44b9.

📒 Files selected for processing (1)
  • daras_ai_v2/asr.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Analyze (python)
  • GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (3)
daras_ai_v2/asr.py (3)

286-286: Enum entry for Akera added — LGTM.

Name and display label are consistent with existing Whisper variants. Confirm intention: Akera is not included in supports_speech_translation(), so the UI should not expose speech-translation for this model.

Would you like me to scan the repo to see where supports_speech_translation() gates UI/API options, ensuring Akera won’t show the translation selector?


392-392: Language support set to {'kik'} — LGTM.

Matches the forced language entry and will scope the selector correctly.


340-340: Confirm external whisper worker supports new model_id
We dispatch the pipeline["model_id"] unchanged via Celery to the external whisper worker, and its implementation isn’t present here. Please verify that the whisper worker’s task handler recognizes and correctly resolves "akera/whisper-large-v3-kik-full_v2" to the intended weights/container.

Comment on lines 1293 to 1304
kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
if "vakyansh" in selected_model.name:
# fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632
kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)
kwargs["chunk_length_s"] = 60
kwargs["stride_length_s"] = (6, 0)
kwargs["batch_size"] = 32
elif "akera" in selected_model.name:
# don't pass language or task
kwargs.pop("task")
kwargs["max_length"] = 448
elif "whisper" in selected_model.name:
Copy link

@coderabbitai coderabbitai bot Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Akera branch drops task: translation requests will be silently ignored.

Because you pop("task"), any non-empty speech_translation_target is effectively ignored for Akera. If that’s intentional (Akera doesn’t support ST), either:

  • raise a clear error when ST is requested, or
  • do a client-side post-translation fallback.

Also, prefer safe pop and explicit enum match to avoid brittle substring checks.

Minimal guard + safer pop:

-        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
+        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
         if "vakyansh" in selected_model.name:
             # fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632
             kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)
             kwargs["chunk_length_s"] = 60
             kwargs["stride_length_s"] = (6, 0)
             kwargs["batch_size"] = 32
-        elif "akera" in selected_model.name:
-            # don't pass language or task
-            kwargs.pop("task")
-            kwargs["max_length"] = 448
+        elif selected_model == AsrModels.whisper_akera_large_v3:
+            # Akera: don't pass language or task to worker
+            kwargs.pop("task", None)
+            if speech_translation_target:
+                raise UserError("Speech translation is not supported by Akera Whisper v3.")
+            kwargs["max_length"] = 448

If you prefer a fallback rather than an error, I can wire run_translate(...) post-ASR and document that SRT/VTT won’t be available for the translated text without time-aligned chunks.

I can add a small unit test that asserts: (a) no task/language is sent for Akera, (b) translation requests raise UserError. Want me to open a test PR?


🏁 Script executed:

#!/bin/bash
# Search for AsrModels enum class and its members, and any literal usage of "akera" models
rg -n "class\s\+AsrModels" -C3 --type py || true
rg -n "AsrModels\." -C1 --type py || true
rg -n "whisper_akera" -C3 --type py || true

Length of output: 10493


🏁 Script executed:

#!/bin/bash
# Extract context around the snippet for Akera branch in daras_ai_v2/asr.py
sed -n '1280,1320p' daras_ai_v2/asr.py
# Confirm the exact model check logic for "akera"
rg -nP --type py 'elif .*akera.*' -C2 daras_ai_v2/asr.py

Length of output: 2075


🏁 Script executed:

#!/bin/bash
# Preview the top of the ASR module to locate the AsrModels definition/import
sed -n '1,200p' daras_ai_v2/asr.py

Length of output: 15273


🏁 Script executed:

#!/bin/bash
# Locate call_celery_task in gpu_server.py
rg -n "def call_celery_task" -C5 daras_ai_v2/gpu_server.py

Length of output: 756


Guard Akera model translation support and use explicit enum match
Akera’s branch currently drops task without notifying when speech_translation_target is set, silently ignoring translation requests.

-        elif "akera" in selected_model.name:
-            # don't pass language or task
-            kwargs.pop("task")
-            kwargs["max_length"] = 448
+        elif selected_model == AsrModels.whisper_akera_large_v3:
+            # Akera Whisper v3 doesn’t support speech translation
+            kwargs.pop("task", None)
+            if speech_translation_target:
+                raise UserError("Speech translation is not supported by Akera Whisper v3.")
+            kwargs["max_length"] = 448
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
if "vakyansh" in selected_model.name:
# fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632
kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)
kwargs["chunk_length_s"] = 60
kwargs["stride_length_s"] = (6, 0)
kwargs["batch_size"] = 32
elif "akera" in selected_model.name:
# don't pass language or task
kwargs.pop("task")
kwargs["max_length"] = 448
elif "whisper" in selected_model.name:
kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
if "vakyansh" in selected_model.name:
# fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632
kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)
kwargs["chunk_length_s"] = 60
kwargs["stride_length_s"] = (6, 0)
kwargs["batch_size"] = 32
elif selected_model == AsrModels.whisper_akera_large_v3:
# Akera Whisper v3 doesn’t support speech translation
kwargs.pop("task", None)
if speech_translation_target:
raise UserError("Speech translation is not supported by Akera Whisper v3.")
kwargs["max_length"] = 448
elif "whisper" in selected_model.name:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i would prefer kwargs.pop("task", None) too

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devxpy Perfect! Using kwargs.pop("task", None) is definitely the safer approach. It prevents potential KeyError exceptions if the "task" key somehow doesn't exist in the kwargs dictionary, making the code more robust through defensive programming.

This small change improves the reliability of the Akera model handling while maintaining the same functionality. Good call on preferring the defensive approach!

🐰 ✨

@nikochiko
Copy link
Member Author

done. thanks!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
daras_ai_v2/asr.py (1)

1293-1304: Akera silently ignores requested Speech Translation; raise error or fallback

Popping task drops translation requests without notice. Either reject ST for Akera or add a post-ASR translate fallback. Also: nice switch to explicit enum + safe pop.

Apply:

@@
-        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
+        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
@@
-        elif selected_model == AsrModels.whisper_akera_large_v3:
-            # don't pass language or task
-            kwargs.pop("task", None)
-            kwargs["max_length"] = 448
+        elif selected_model == AsrModels.whisper_akera_large_v3:
+            # Akera: don't pass language or task
+            if speech_translation_target:
+                raise UserError("Speech translation is not supported by Akera Whisper v3.")
+            kwargs.pop("task", None)
+            kwargs["max_length"] = 448

If you prefer fallback instead of an error, I can wire a post-ASR translate path and note that SRT/VTT won’t align after translation.

Check worker accepts max_length and absence of task/language:

#!/bin/bash
rg -nC3 -P '@app\.task\(["'\'']whisper["'\'']|def\s+whisper' --type py
rg -nC2 -P '\binputs\s*=\s*dict|kwargs' --type py
rg -nC3 -P '\b(task|language|max_length)\b' --type py
🧹 Nitpick comments (1)
daras_ai_v2/asr.py (1)

367-367: Forced language “kik”: UI-only vs backend

You force “kik” in UI, but the Akera branch intentionally doesn’t send language to the worker. Consider a comment noting this is UI-only to avoid confusion later.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f4a44b9 and 88cd5d1.

📒 Files selected for processing (1)
  • daras_ai_v2/asr.py (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (3)
daras_ai_v2/asr.py (3)

286-286: Akera enum addition looks good

Consistent with existing naming.


392-392: Supported languages mapping for Akera is coherent

Matches the forced language choice.


340-340: Verify GPU “whisper” worker support for the new model_id
Ensure the external GPU “whisper” task recognizes “akera/whisper-large-v3-kik-full_v2” as a valid model_id and update the worker service if it does not.

@nikochiko nikochiko merged commit 47c4020 into master Sep 3, 2025
8 checks passed
@nikochiko nikochiko deleted the akera-asr branch September 3, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants