- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3
feat: add support for akera ASR model #788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 📝 WalkthroughWalkthrough
 Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
 Suggested reviewers
 ✨ Finishing Touches
 🧪 Generate unit tests
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type  Other keywords and placeholders
 CodeRabbit Configuration File ( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
daras_ai_v2/asr.py (1)
367-368: Forced language = kik: confirm code system.We use ISO-639-3 here (
kik). Elsewhere some models use ISO-639-1 (ki).langcodesshould normalize either, but mixing systems can confuse consumers.Apply if you prefer 2-letter consistency:
- AsrModels.whisper_akera_large_v3: "kik", + AsrModels.whisper_akera_large_v3: "ki",If Akera specifically expects
kik, keep as-is and add a short comment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
- daras_ai_v2/asr.py(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Analyze (python)
- GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (3)
daras_ai_v2/asr.py (3)
286-286: Enum entry for Akera added — LGTM.Name and display label are consistent with existing Whisper variants. Confirm intention: Akera is not included in
supports_speech_translation(), so the UI should not expose speech-translation for this model.Would you like me to scan the repo to see where
supports_speech_translation()gates UI/API options, ensuring Akera won’t show the translation selector?
392-392: Language support set to {'kik'} — LGTM.Matches the forced language entry and will scope the selector correctly.
340-340: Confirm external whisper worker supports new model_id
We dispatch thepipeline["model_id"]unchanged via Celery to the external whisper worker, and its implementation isn’t present here. Please verify that the whisper worker’s task handler recognizes and correctly resolves"akera/whisper-large-v3-kik-full_v2"to the intended weights/container.
| kwargs = {"task": "translate" if speech_translation_target else "transcribe"} | ||
| if "vakyansh" in selected_model.name: | ||
| # fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632 | ||
| kwargs["decoder_kwargs"] = dict(skip_special_tokens=True) | ||
| kwargs["chunk_length_s"] = 60 | ||
| kwargs["stride_length_s"] = (6, 0) | ||
| kwargs["batch_size"] = 32 | ||
| elif "akera" in selected_model.name: | ||
| # don't pass language or task | ||
| kwargs.pop("task") | ||
| kwargs["max_length"] = 448 | ||
| elif "whisper" in selected_model.name: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Akera branch drops task: translation requests will be silently ignored.
Because you pop("task"), any non-empty speech_translation_target is effectively ignored for Akera. If that’s intentional (Akera doesn’t support ST), either:
- raise a clear error when ST is requested, or
- do a client-side post-translation fallback.
Also, prefer safe pop and explicit enum match to avoid brittle substring checks.
Minimal guard + safer pop:
-        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
+        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
         if "vakyansh" in selected_model.name:
             # fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632
             kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)
             kwargs["chunk_length_s"] = 60
             kwargs["stride_length_s"] = (6, 0)
             kwargs["batch_size"] = 32
-        elif "akera" in selected_model.name:
-            # don't pass language or task
-            kwargs.pop("task")
-            kwargs["max_length"] = 448
+        elif selected_model == AsrModels.whisper_akera_large_v3:
+            # Akera: don't pass language or task to worker
+            kwargs.pop("task", None)
+            if speech_translation_target:
+                raise UserError("Speech translation is not supported by Akera Whisper v3.")
+            kwargs["max_length"] = 448If you prefer a fallback rather than an error, I can wire run_translate(...) post-ASR and document that SRT/VTT won’t be available for the translated text without time-aligned chunks.
I can add a small unit test that asserts: (a) no task/language is sent for Akera, (b) translation requests raise UserError. Want me to open a test PR?
🏁 Script executed:
#!/bin/bash
# Search for AsrModels enum class and its members, and any literal usage of "akera" models
rg -n "class\s\+AsrModels" -C3 --type py || true
rg -n "AsrModels\." -C1 --type py || true
rg -n "whisper_akera" -C3 --type py || trueLength of output: 10493
🏁 Script executed:
#!/bin/bash
# Extract context around the snippet for Akera branch in daras_ai_v2/asr.py
sed -n '1280,1320p' daras_ai_v2/asr.py
# Confirm the exact model check logic for "akera"
rg -nP --type py 'elif .*akera.*' -C2 daras_ai_v2/asr.pyLength of output: 2075
🏁 Script executed:
#!/bin/bash
# Preview the top of the ASR module to locate the AsrModels definition/import
sed -n '1,200p' daras_ai_v2/asr.pyLength of output: 15273
🏁 Script executed:
#!/bin/bash
# Locate call_celery_task in gpu_server.py
rg -n "def call_celery_task" -C5 daras_ai_v2/gpu_server.pyLength of output: 756
Guard Akera model translation support and use explicit enum match
Akera’s branch currently drops task without notifying when speech_translation_target is set, silently ignoring translation requests.
-        elif "akera" in selected_model.name:
-            # don't pass language or task
-            kwargs.pop("task")
-            kwargs["max_length"] = 448
+        elif selected_model == AsrModels.whisper_akera_large_v3:
+            # Akera Whisper v3 doesn’t support speech translation
+            kwargs.pop("task", None)
+            if speech_translation_target:
+                raise UserError("Speech translation is not supported by Akera Whisper v3.")
+            kwargs["max_length"] = 448📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| kwargs = {"task": "translate" if speech_translation_target else "transcribe"} | |
| if "vakyansh" in selected_model.name: | |
| # fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632 | |
| kwargs["decoder_kwargs"] = dict(skip_special_tokens=True) | |
| kwargs["chunk_length_s"] = 60 | |
| kwargs["stride_length_s"] = (6, 0) | |
| kwargs["batch_size"] = 32 | |
| elif "akera" in selected_model.name: | |
| # don't pass language or task | |
| kwargs.pop("task") | |
| kwargs["max_length"] = 448 | |
| elif "whisper" in selected_model.name: | |
| kwargs = {"task": "translate" if speech_translation_target else "transcribe"} | |
| if "vakyansh" in selected_model.name: | |
| # fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632 | |
| kwargs["decoder_kwargs"] = dict(skip_special_tokens=True) | |
| kwargs["chunk_length_s"] = 60 | |
| kwargs["stride_length_s"] = (6, 0) | |
| kwargs["batch_size"] = 32 | |
| elif selected_model == AsrModels.whisper_akera_large_v3: | |
| # Akera Whisper v3 doesn’t support speech translation | |
| kwargs.pop("task", None) | |
| if speech_translation_target: | |
| raise UserError("Speech translation is not supported by Akera Whisper v3.") | |
| kwargs["max_length"] = 448 | |
| elif "whisper" in selected_model.name: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i would prefer kwargs.pop("task", None) too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@devxpy Perfect! Using kwargs.pop("task", None) is definitely the safer approach. It prevents potential KeyError exceptions if the "task" key somehow doesn't exist in the kwargs dictionary, making the code more robust through defensive programming.
This small change improves the reliability of the Akera model handling while maintaining the same functionality. Good call on preferring the defensive approach!
🐰 ✨
| done. thanks! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
daras_ai_v2/asr.py (1)
1293-1304: Akera silently ignores requested Speech Translation; raise error or fallbackPopping task drops translation requests without notice. Either reject ST for Akera or add a post-ASR translate fallback. Also: nice switch to explicit enum + safe pop.
Apply:
@@ - kwargs = {"task": "translate" if speech_translation_target else "transcribe"} + kwargs = {"task": "translate" if speech_translation_target else "transcribe"} @@ - elif selected_model == AsrModels.whisper_akera_large_v3: - # don't pass language or task - kwargs.pop("task", None) - kwargs["max_length"] = 448 + elif selected_model == AsrModels.whisper_akera_large_v3: + # Akera: don't pass language or task + if speech_translation_target: + raise UserError("Speech translation is not supported by Akera Whisper v3.") + kwargs.pop("task", None) + kwargs["max_length"] = 448If you prefer fallback instead of an error, I can wire a post-ASR translate path and note that SRT/VTT won’t align after translation.
Check worker accepts max_length and absence of task/language:
#!/bin/bash rg -nC3 -P '@app\.task\(["'\'']whisper["'\'']|def\s+whisper' --type py rg -nC2 -P '\binputs\s*=\s*dict|kwargs' --type py rg -nC3 -P '\b(task|language|max_length)\b' --type py
🧹 Nitpick comments (1)
daras_ai_v2/asr.py (1)
367-367: Forced language “kik”: UI-only vs backendYou force “kik” in UI, but the Akera branch intentionally doesn’t send language to the worker. Consider a comment noting this is UI-only to avoid confusion later.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
- daras_ai_v2/asr.py(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (3.10.12, 1.8.3)
🔇 Additional comments (3)
daras_ai_v2/asr.py (3)
286-286: Akera enum addition looks goodConsistent with existing naming.
392-392: Supported languages mapping for Akera is coherentMatches the forced language choice.
340-340: Verify GPU “whisper” worker support for the new model_id
Ensure the external GPU “whisper” task recognizes “akera/whisper-large-v3-kik-full_v2” as a valid model_id and update the worker service if it does not.
Q/A checklist
How to check import time?
You can visualize this using tuna:
To measure import time for a specific library:
To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:
Legal Boilerplate
Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.