feat: add support for akera ASR model #788

nikochiko · 2025-09-01T11:20:26Z

Q/A checklist

I have tested my UI changes on mobile and they look acceptable
I have tested changes to the workflows in both the API and the UI
I have done a code review of my changes and looked at each line of the diff + the references of each function I have changed
My changes have not increased the import time of the server

How to check import time?

time python -c 'import server'

You can visualize this using tuna:

python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

coderabbitai · 2025-09-01T11:20:33Z

📝 Walkthrough

Walkthrough

Added AsrModels.whisper_akera_large_v3 ("Akera Whisper v3 (akera)").
Added mapping asr_model_ids[AsrModels.whisper_akera_large_v3] → "akera/whisper-large-v3-kik-full_v2".
Added forced_asr_languages[AsrModels.whisper_akera_large_v3] → "kik".
Added asr_supported_languages[AsrModels.whisper_akera_large_v3] → {"kik"}.
Modified run_asr defaults for self-hosted models to set kwargs["task"] = "translate" when speech_translation_target is set, otherwise "transcribe".
Added Akera-specific handling: if selected model name contains "akera", remove "task" from kwargs and set max_length = 448.
Updated Whisper input construction to omit the task parameter for Akera models.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add mbaza asr model #780 — Adds AsrModels enum members and updates asr_model_ids/asr_supported_languages in daras_ai_v2/asr.py; overlaps enum/mapping changes.
fix: run nemo_asr task for mbaza_ctc_large model #781 — Modifies run_asr model-selection/branching logic; related to the Akera-specific kwargs/branching here.
feat: add sunbird ugandan whisper & jacaranda health swahili to ASR models #759 — Previously added AsrModels entries and mapping updates in daras_ai_v2/asr.py, overlapping with these mapping changes.

Suggested reviewers

devxpy

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch akera-asr

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

daras_ai_v2/asr.py (1)
367-368: Forced language = kik: confirm code system.

We use ISO-639-3 here (kik). Elsewhere some models use ISO-639-1 (ki). langcodes should normalize either, but mixing systems can confuse consumers.

Apply if you prefer 2-letter consistency:
-    AsrModels.whisper_akera_large_v3: "kik",
+    AsrModels.whisper_akera_large_v3: "ki",
If Akera specifically expects kik, keep as-is and add a short comment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0d84f63 and f4a44b9.

📒 Files selected for processing (1)

daras_ai_v2/asr.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Analyze (python)
GitHub Check: test (3.10.12, 1.8.3)

🔇 Additional comments (3)

daras_ai_v2/asr.py (3)

286-286: Enum entry for Akera added — LGTM.

Name and display label are consistent with existing Whisper variants. Confirm intention: Akera is not included in supports_speech_translation(), so the UI should not expose speech-translation for this model.

Would you like me to scan the repo to see where supports_speech_translation() gates UI/API options, ensuring Akera won’t show the translation selector?

392-392: Language support set to {'kik'} — LGTM.

Matches the forced language entry and will scope the selector correctly.

340-340: Confirm external whisper worker supports new model_id
We dispatch the pipeline["model_id"] unchanged via Celery to the external whisper worker, and its implementation isn’t present here. Please verify that the whisper worker’s task handler recognizes and correctly resolves "akera/whisper-large-v3-kik-full_v2" to the intended weights/container.

coderabbitai · 2025-09-01T11:28:16Z

daras_ai_v2/asr.py

+        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
        if "vakyansh" in selected_model.name:
            # fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632
            kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)
            kwargs["chunk_length_s"] = 60
            kwargs["stride_length_s"] = (6, 0)
            kwargs["batch_size"] = 32
+        elif "akera" in selected_model.name:
+            # don't pass language or task
+            kwargs.pop("task")
+            kwargs["max_length"] = 448
        elif "whisper" in selected_model.name:


💡 Verification agent

🧩 Analysis chain

Akera branch drops task: translation requests will be silently ignored.

Because you pop("task"), any non-empty speech_translation_target is effectively ignored for Akera. If that’s intentional (Akera doesn’t support ST), either:

raise a clear error when ST is requested, or

do a client-side post-translation fallback.

Also, prefer safe pop and explicit enum match to avoid brittle substring checks.

Minimal guard + safer pop:

- kwargs = {"task": "translate" if speech_translation_target else "transcribe"} + kwargs = {"task": "translate" if speech_translation_target else "transcribe"} if "vakyansh" in selected_model.name: # fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632 kwargs["decoder_kwargs"] = dict(skip_special_tokens=True) kwargs["chunk_length_s"] = 60 kwargs["stride_length_s"] = (6, 0) kwargs["batch_size"] = 32 - elif "akera" in selected_model.name: - # don't pass language or task - kwargs.pop("task") - kwargs["max_length"] = 448 + elif selected_model == AsrModels.whisper_akera_large_v3: + # Akera: don't pass language or task to worker + kwargs.pop("task", None) + if speech_translation_target: + raise UserError("Speech translation is not supported by Akera Whisper v3.") + kwargs["max_length"] = 448

If you prefer a fallback rather than an error, I can wire run_translate(...) post-ASR and document that SRT/VTT won’t be available for the translated text without time-aligned chunks.

I can add a small unit test that asserts: (a) no task/language is sent for Akera, (b) translation requests raise UserError. Want me to open a test PR?

🏁 Script executed:

#!/bin/bash # Search for AsrModels enum class and its members, and any literal usage of "akera" models rg -n "class\s\+AsrModels" -C3 --type py || true rg -n "AsrModels\." -C1 --type py || true rg -n "whisper_akera" -C3 --type py || true

Length of output: 10493

🏁 Script executed:

#!/bin/bash # Extract context around the snippet for Akera branch in daras_ai_v2/asr.py sed -n '1280,1320p' daras_ai_v2/asr.py # Confirm the exact model check logic for "akera" rg -nP --type py 'elif .*akera.*' -C2 daras_ai_v2/asr.py

Length of output: 2075

🏁 Script executed:

#!/bin/bash # Preview the top of the ASR module to locate the AsrModels definition/import sed -n '1,200p' daras_ai_v2/asr.py

Length of output: 15273

🏁 Script executed:

#!/bin/bash # Locate call_celery_task in gpu_server.py rg -n "def call_celery_task" -C5 daras_ai_v2/gpu_server.py

Length of output: 756

Guard Akera model translation support and use explicit enum match
Akera’s branch currently drops task without notifying when speech_translation_target is set, silently ignoring translation requests.

- elif "akera" in selected_model.name: - # don't pass language or task - kwargs.pop("task") - kwargs["max_length"] = 448 + elif selected_model == AsrModels.whisper_akera_large_v3: + # Akera Whisper v3 doesn’t support speech translation + kwargs.pop("task", None) + if speech_translation_target: + raise UserError("Speech translation is not supported by Akera Whisper v3.") + kwargs["max_length"] = 448

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

kwargs = {"task": "translate" if speech_translation_target else "transcribe"}

if "vakyansh" in selected_model.name:

# fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632

kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)

kwargs["chunk_length_s"] = 60

kwargs["stride_length_s"] = (6, 0)

kwargs["batch_size"] = 32

elif "akera" in selected_model.name:

# don't pass language or task

kwargs.pop("task")

kwargs["max_length"] = 448

elif "whisper" in selected_model.name:

kwargs = {"task": "translate" if speech_translation_target else "transcribe"}

if "vakyansh" in selected_model.name:

# fixes https://github.com/huggingface/transformers/issues/15275#issuecomment-1624879632

kwargs["decoder_kwargs"] = dict(skip_special_tokens=True)

kwargs["chunk_length_s"] = 60

kwargs["stride_length_s"] = (6, 0)

kwargs["batch_size"] = 32

elif selected_model == AsrModels.whisper_akera_large_v3:

# Akera Whisper v3 doesn’t support speech translation

kwargs.pop("task", None)

if speech_translation_target:

raise UserError("Speech translation is not supported by Akera Whisper v3.")

kwargs["max_length"] = 448

elif "whisper" in selected_model.name:

yes, i would prefer kwargs.pop("task", None) too

@devxpy Perfect! Using kwargs.pop("task", None) is definitely the safer approach. It prevents potential KeyError exceptions if the "task" key somehow doesn't exist in the kwargs dictionary, making the code more robust through defensive programming.

This small change improves the reliability of the Akera model handling while maintaining the same functionality. Good call on preferring the defensive approach!

🐰 ✨

nikochiko · 2025-09-02T15:10:45Z

done. thanks!

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

daras_ai_v2/asr.py (1)

1293-1304: Akera silently ignores requested Speech Translation; raise error or fallback

Popping task drops translation requests without notice. Either reject ST for Akera or add a post-ASR translate fallback. Also: nice switch to explicit enum + safe pop.

Apply:

@@
-        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
+        kwargs = {"task": "translate" if speech_translation_target else "transcribe"}
@@
-        elif selected_model == AsrModels.whisper_akera_large_v3:
-            # don't pass language or task
-            kwargs.pop("task", None)
-            kwargs["max_length"] = 448
+        elif selected_model == AsrModels.whisper_akera_large_v3:
+            # Akera: don't pass language or task
+            if speech_translation_target:
+                raise UserError("Speech translation is not supported by Akera Whisper v3.")
+            kwargs.pop("task", None)
+            kwargs["max_length"] = 448

If you prefer fallback instead of an error, I can wire a post-ASR translate path and note that SRT/VTT won’t align after translation.

Check worker accepts max_length and absence of task/language:

#!/bin/bash
rg -nC3 -P '@app\.task\(["'\'']whisper["'\'']|def\s+whisper' --type py
rg -nC2 -P '\binputs\s*=\s*dict|kwargs' --type py
rg -nC3 -P '\b(task|language|max_length)\b' --type py

🧹 Nitpick comments (1)

daras_ai_v2/asr.py (1)

367-367: Forced language “kik”: UI-only vs backend

You force “kik” in UI, but the Akera branch intentionally doesn’t send language to the worker. Consider a comment noting this is UI-only to avoid confusion later.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f4a44b9 and 88cd5d1.

📒 Files selected for processing (1)

daras_ai_v2/asr.py (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (3.10.12, 1.8.3)

🔇 Additional comments (3)

daras_ai_v2/asr.py (3)

286-286: Akera enum addition looks good

Consistent with existing naming.

392-392: Supported languages mapping for Akera is coherent

Matches the forced language choice.

340-340: Verify GPU “whisper” worker support for the new model_id
Ensure the external GPU “whisper” task recognizes “akera/whisper-large-v3-kik-full_v2” as a valid model_id and update the worker service if it does not.

feat: add support for akera ASR model

f4a44b9

coderabbitai bot reviewed Sep 1, 2025

View reviewed changes

devxpy approved these changes Sep 2, 2025

View reviewed changes

refactor: s/kwargs.pop("task")/kwargs.pop("task", None)

88cd5d1

coderabbitai bot reviewed Sep 2, 2025

View reviewed changes

nikochiko merged commit 47c4020 into master Sep 3, 2025
8 checks passed

nikochiko deleted the akera-asr branch September 3, 2025 13:01

This was referenced Sep 3, 2025

fix: change title for akera/kikuyu finetuned whisper model #791

Open

fix: sunbird: use tokens for languages instead of langcodes #794

Merged

fix: handle case for sunbird when language is None #796

Merged

feat: add support for akera ASR model #788

feat: add support for akera ASR model #788

Uh oh!

Conversation

nikochiko commented Sep 1, 2025

Q/A checklist

Legal Boilerplate

Uh oh!

coderabbitai bot commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devxpy Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

nikochiko commented Sep 2, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Sep 1, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Sep 1, 2025 •

edited

Loading