[Deps] Upgrade to transformers 4.56.x #587

zhiyuan1i · 2025-09-09T16:46:43Z

Summary by CodeRabbit

New Features
- More consistent generation behavior across models with improved handling for cache-aware decoding and clearer errors for unsupported decoding strategies.
Dependency
- Transformers requirement relaxed to >=4.53.0 (no upper bound) for broader compatibility.
Refactor
- Many causal‑LM models moved to the new unified generation backend while preserving the public generate API.
Chores
- CI: smarter Conda detection, env propagation, dependency installs, and extra diagnostic logging.
- Runtime: added warning when using an untested Triton conv1d backend.

coderabbitai · 2025-09-09T16:46:53Z

Warning

Rate limit exceeded

@zhiyuan1i has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 27 minutes and 37 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between bfc34d3 and 4839b6a.

📒 Files selected for processing (1)

tests/models/test_modeling_mom.py (1 hunks)

Walkthrough

Replaces per-model GenerationMixin with a new FLAGenerationMixin, centralizes version-gated prepare_inputs_for_generation and Cache logic in fla/models/utils.py, relaxes the transformers version bound, adjusts CI to install transformers and discover/export Conda envs, adds a Triton warning in fla/layers/mamba2.py, and updates a test to use cuda backend.

Changes

Cohort / File(s)	Summary of changes
CI workflow `.github/workflows/reusable-ci-tests.yml`	Add `transformers` to dependency installs across multiple PyTorch paths; add dynamic Conda discovery/export (CONDA, CONDA_ENV_NAME, CONDA_BIN_PATH), runner-specific TARGET_CONDA_ENV selection, environment verification prints, and extra logging/env propagation for subsequent steps.
Dependency constraints `pyproject.toml`, `setup.py`	Loosen `transformers` constraint from `transformers>=4.53.0,<4.56.0` → `transformers>=4.53.0` (remove upper bound).
Central generation utilities `fla/models/utils.py`	Add `FLAGenerationMixin(GenerationMixin)` with a version-gated `prepare_inputs_for_generation` (supports Transformers >=4.56 behavior and legacy behavior), introduce `_IS_TRANSFORMERS_4_56_PLUS` flag, and make `Cache` subclassing conditional on transformers version. (Note: file now contains paired/duplicated public declarations introduced by the change.)
Model mixin swap (bulk) `fla/models//modeling_.py`	Replace `GenerationMixin` imports/usage with `FLAGenerationMixin` and add `FLAGenerationMixin`/`Cache` imports from `fla.models.utils`; remove per-model `prepare_inputs_for_generation` helpers across many CausalLM model files (e.g., `abc`, `bitnet`, `comba`, `delta_net`, `forgetting_transformer`, `gated_deltanet`, `gated_deltaproduct`, `gla`, `gsa`, `hgrn`, `hgrn2`, `lightnet`, `linear_attn`, `log_linear_mamba2`, `mamba`, `mamba2`, `mesa_net`, `mla`, `mom`, `nsa`, `path_attn`, `retnet`, `rodimus`, `rwkv6`, `rwkv7`, `samba`, `transformer`).
Generate wrappers / error handling `fla/models//modeling_.py`	For several models (examples: `gated_deltaproduct`, `gla`, `hgrn`, `rwkv7`, ...) add/adjust `generate` wrappers to catch `AttributeError` related to `past_key_values` and raise clearer errors or re-raise.
RWKV7 state migration `fla/models/rwkv7/modeling_rwkv7.py`	Switch to `FLAGenerationMixin`, remove per-model `prepare_inputs_for_generation`, and add state-dict migration logic in `load_state_dict` to remap v1→v2 parameter keys before delegating to superclass load.
Mamba2 runtime warning & test `fla/layers/mamba2.py`, `tests/models/test_modeling_mamba2.py`	Add runtime warning when Triton conv1d backend is selected in `Mamba2.__init__`; change test parameterization to use `cuda` instead of `triton` for one case.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Model as CausalLM Model
  participant Mixin as FLAGenerationMixin
  participant HF as Transformers Runtime

  User->>Model: generate(input_ids, past_key_values?, cache_position?, ...)
  Model->>Mixin: prepare_inputs_for_generation(...)
  alt Transformers >= 4.56
    Mixin->>Mixin: if _cache_dependant_input_preparation exists
    opt cache_position provided
      Mixin->>Model: _cache_dependant_input_preparation(...)
      Model-->>Mixin: model_inputs
    end
    Mixin->>Mixin: slice by cache_position or last-token, include cache_position/logits_to_keep
    Mixin-->>Model: model_inputs
  else Older Transformers
    Mixin->>Mixin: legacy last-token / inputs_embeds handling
    Mixin-->>Model: model_inputs
  end
  Model->>HF: forward(**model_inputs)
  HF-->>User: logits / generated_ids

sequenceDiagram
  autonumber
  participant Caller as Caller
  participant RWKV as RWKV7Model
  participant PT as PreTrainedModel

  Caller->>RWKV: load_state_dict(state_dict, strict=..., ...)
  RWKV->>RWKV: detect v1 key patterns under model.layers.*
  alt v1 keys present
    RWKV->>RWKV: map v1 attn/* keys → v2 names (x_r, x_w, x_k, x_v, x_a, x_g)
  end
  RWKV->>PT: super().load_state_dict(migrated_state_dict, strict=...)
  PT-->>Caller: LoadResult

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[Deps] Adopt transformers>4.53.3 #571 — Overlaps changes to fla/models/utils.py and transformers compatibility/version-gated generation/cache logic.
[CI] Refract GPU CIs #487 — Modifies the same reusable CI workflow and dependency-install steps touched here.
[Deps] Lock transformers<4.56.0 #582 — Also edits fla/models/utils.py and transformers compatibility handling; strong overlap on generation/cache gating.

Suggested reviewers

yzhangcs

Pre-merge checks (1 passed, 1 warning, 1 inconclusive)

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.76% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description Check	❓ Inconclusive	The pull request description is not present in the provided context so its relevance to the changeset cannot be assessed. Without any visible description, there is insufficient information to determine whether it meaningfully relates to the modifications. This makes the description check inconclusive.	Please update the pull request with a clear description summarizing the major changes, including the dependency upgrade to transformers 4.56.x, the introduction and application of FLAGenerationMixin, and any CI pipeline adjustments. A concise overview of key impacted areas will help reviewers quickly understand the scope and rationale of this PR.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title “[Deps] Upgrade to transformers 4.56.x” succinctly captures the primary intent of loosening the transformers version constraint and enabling support for the new 4.56.x release, which is the core focus of the changeset. It is concise, free of extraneous detail, and directly reflects the dependency update that reviewers will care about. As a standalone line, it clearly conveys the main change without delving into implementation specifics.

Poem

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch upstream-transformers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Summary of Changes

Hello @zhiyuan1i, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on updating the project's dependency on the Hugging Face Transformers library to support newer versions, specifically 4.56.x and beyond. To maintain compatibility and abstract away API differences, a custom generation mixin has been implemented. This ensures that all models within the project can leverage the latest transformers features while gracefully handling older versions, without requiring individual model adjustments for generation input preparation.

Highlights

Dependency Upgrade: The transformers library dependency has been updated to allow versions 4.56.x and higher, removing the previous upper bound.
Custom Generation Mixin: A new FLAGenerationMixin has been introduced in fla/models/utils.py to encapsulate generation logic and ensure compatibility across different transformers library versions.
Model Class Refactoring: All causal language model classes (e.g., ABCForCausalLM, MambaForCausalLM) now inherit from the newly introduced FLAGenerationMixin instead of the native transformers.generation.GenerationMixin.
Generation Input Preparation: The FLAGenerationMixin provides a version-aware prepare_inputs_for_generation method, handling API changes introduced in transformers versions 4.56.0 and above, particularly concerning cache_position.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request upgrades the transformers dependency to support version 4.56.x and introduces a FLAGenerationMixin to handle API changes for text generation, which is a good approach for maintaining compatibility. However, I've identified a critical bug in the new mixin that could cause crashes with older transformers versions, and an incomplete refactoring where many models won't benefit from this new compatibility layer. Please see my detailed comments for suggestions on how to address these issues.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

🧹 Nitpick comments (11)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a982c2e and fec46c7.

📒 Files selected for processing (31)

.github/workflows/reusable-ci-tests.yml (3 hunks)
fla/models/abc/modeling_abc.py (2 hunks)
fla/models/bitnet/modeling_bitnet.py (2 hunks)
fla/models/comba/modeling_comba.py (2 hunks)
fla/models/delta_net/modeling_delta_net.py (2 hunks)
fla/models/forgetting_transformer/modeling_forgetting_transformer.py (2 hunks)
fla/models/gated_deltanet/modeling_gated_deltanet.py (2 hunks)
fla/models/gated_deltaproduct/modeling_gated_deltaproduct.py (2 hunks)
fla/models/gla/modeling_gla.py (2 hunks)
fla/models/gsa/modeling_gsa.py (2 hunks)
fla/models/hgrn/modeling_hgrn.py (2 hunks)
fla/models/hgrn2/modeling_hgrn2.py (2 hunks)
fla/models/lightnet/modeling_lightnet.py (2 hunks)
fla/models/linear_attn/modeling_linear_attn.py (2 hunks)
fla/models/log_linear_mamba2/modeling_log_linear_mamba2.py (2 hunks)
fla/models/mamba/modeling_mamba.py (2 hunks)
fla/models/mamba2/modeling_mamba2.py (2 hunks)
fla/models/mesa_net/modeling_mesa_net.py (2 hunks)
fla/models/mla/modeling_mla.py (2 hunks)
fla/models/mom/modeling_mom.py (2 hunks)
fla/models/nsa/modeling_nsa.py (2 hunks)
fla/models/path_attn/modeling_path_attention.py (2 hunks)
fla/models/retnet/modeling_retnet.py (2 hunks)
fla/models/rodimus/modeling_rodimus.py (2 hunks)
fla/models/rwkv6/modeling_rwkv6.py (2 hunks)
fla/models/rwkv7/modeling_rwkv7.py (2 hunks)
fla/models/samba/modeling_samba.py (2 hunks)
fla/models/transformer/modeling_transformer.py (2 hunks)
fla/models/utils.py (2 hunks)
pyproject.toml (1 hunks)
setup.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (9)

fla/models/mom/modeling_mom.py
fla/models/gsa/modeling_gsa.py
fla/models/forgetting_transformer/modeling_forgetting_transformer.py
pyproject.toml
fla/models/lightnet/modeling_lightnet.py
fla/models/mamba/modeling_mamba.py
fla/models/rodimus/modeling_rodimus.py
fla/models/mamba2/modeling_mamba2.py
fla/models/mla/modeling_mla.py

🧰 Additional context used

🧬 Code graph analysis (20)

fla/models/samba/modeling_samba.py (1)

fla/models/nsa/modeling_nsa.py (1)

fla/models/hgrn2/modeling_hgrn2.py (1)

fla/models/path_attn/modeling_path_attention.py (1)

fla/models/mesa_net/modeling_mesa_net.py (1)

fla/models/gla/modeling_gla.py (1)

fla/models/rwkv7/modeling_rwkv7.py (1)

fla/models/rwkv6/modeling_rwkv6.py (1)

fla/models/transformer/modeling_transformer.py (1)

fla/models/bitnet/modeling_bitnet.py (1)

fla/models/comba/modeling_comba.py (1)

fla/models/abc/modeling_abc.py (1)

fla/models/linear_attn/modeling_linear_attn.py (1)

fla/models/utils.py (2)

fla/models/gated_deltanet/modeling_gated_deltanet.py (1)

fla/models/gated_deltaproduct/modeling_gated_deltaproduct.py (1)

fla/models/log_linear_mamba2/modeling_log_linear_mamba2.py (1)

fla/models/hgrn/modeling_hgrn.py (1)

fla/models/retnet/modeling_retnet.py (1)

fla/models/delta_net/modeling_delta_net.py (1)

🪛 Ruff (0.12.2)

fla/models/utils.py

404-404: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Test H100 (PyTorch 2.7) / test-models

🔇 Additional comments (41)

coderabbitai · 2025-09-09T17:18:29Z

.github/workflows/reusable-ci-tests.yml

+              $CONDA_BIN_PATH/pip install -U numpy packaging psutil ninja einops datasets transformers
              $CONDA_BIN_PATH/pip install --no-deps .
            else
              STABLE_URL="https://download.pytorch.org/whl/${{ inputs.pytorch_cuda_version }}"
              echo "Using stable index URL: $STABLE_URL"
              $CONDA_BIN_PATH/pip install -U torch~=${{ inputs.pytorch_version }} triton --index-url $STABLE_URL
-              $CONDA_BIN_PATH/pip install -U numpy packaging psutil ninja einops datasets
+              $CONDA_BIN_PATH/pip install -U numpy packaging psutil ninja einops datasets transformers
              $CONDA_BIN_PATH/pip install .


🛠️ Refactor suggestion

Install transformers consistently across jobs/branches

Nightly paths install transformers, but stable paths in test-models (both NVIDIA and Intel) do not. This can leave an older preinstalled version and mask issues. Install it in those paths too.

- $CONDA_BIN_PATH/pip install -U torch~=${{ inputs.pytorch_version }} triton --index-url $STABLE_URL - $CONDA_BIN_PATH/pip install -U numpy + $CONDA_BIN_PATH/pip install -U torch~=${{ inputs.pytorch_version }} triton --index-url $STABLE_URL + $CONDA_BIN_PATH/pip install -U numpy packaging psutil ninja einops datasets transformers $CONDA_BIN_PATH/pip install .

- $CONDA_BIN_PATH/pip install -U torch~=${{ inputs.pytorch_version }} pytorch-triton-xpu --index-url $XPU_URL - $CONDA_BIN_PATH/pip install -U numpy + $CONDA_BIN_PATH/pip install -U torch~=${{ inputs.pytorch_version }} pytorch-triton-xpu --index-url $XPU_URL + $CONDA_BIN_PATH/pip install -U numpy packaging psutil ninja einops datasets transformers $CONDA_BIN_PATH/pip install .

Also applies to: 158-160, 321-329, 340-342

🤖 Prompt for AI Agents

.github/workflows/reusable-ci-tests.yml around lines 140-147 (also apply same change at 158-160, 321-329, 340-342): the stable-path branches do not explicitly install the transformers package, which can leave an older preinstalled version and hide regressions; modify those stable-path install blocks to include transformers in the pip install commands (matching the nightly paths) so that transformers is installed/updated explicitly in the stable flows as well, ensuring each affected block lists transformers alongside numpy packaging psutil ninja einops datasets (and any other packages already present).

coderabbitai · 2025-09-09T17:18:29Z

fla/models/gated_deltanet/modeling_gated_deltanet.py



-class GatedDeltaNetForCausalLM(GatedDeltaNetPreTrainedModel, GenerationMixin):
+class GatedDeltaNetForCausalLM(GatedDeltaNetPreTrainedModel, FLAGenerationMixin):


💡 Verification agent

🧩 Analysis chain

Adopt FLAGenerationMixin for generation — OK. Verify no stale per-model overrides remain.
Since the mixin centralizes prepare_inputs_for_generation, ensure no lingering model-level overrides cause shadowing.

Run to confirm no stray definitions remain:

🏁 Script executed:

#!/bin/bash rg -nP --type=py '\bdef\s+prepare_inputs_for_generation\b' fla/models | sed -n '1,200p'

Length of output: 136

Remove per-model override: Delete or delegate the prepare_inputs_for_generation method in fla/models/mom/modeling_mom.py (line 394) so it no longer shadows FLAGenerationMixin.

🤖 Prompt for AI Agents

In fla/models/gated_deltanet/modeling_gated_deltanet.py around line 269: the class GatedDeltaNetForCausalLM is being shadowed by a per-model override of prepare_inputs_for_generation in fla/models/mom/modeling_mom.py at line 394; remove that method from modeling_mom.py or change it to delegate to the mixin by replacing its body with a simple return super().prepare_inputs_for_generation(*args, **kwargs) (or calling FLAGenerationMixin.prepare_inputs_for_generation(self, *args, **kwargs)) so the mixin implementation is used and no longer shadowed.

coderabbitai · 2025-09-09T17:18:29Z

fla/models/samba/modeling_samba.py



-class SambaForCausalLM(SambaPreTrainedModel, GenerationMixin):
+class SambaForCausalLM(SambaPreTrainedModel, FLAGenerationMixin):


⚠️ Potential issue

Breaking: removed Samba’s prepare_inputs_for_generation but forward expects cache_params, not past_key_values.

FLAGenerationMixin’s default prepare_inputs_for_generation returns past_key_values; Samba forward does not accept it and instead needs cache_params. Generation will raise “got an unexpected keyword argument 'past_key_values'”.

Add Samba-specific prepare_inputs_for_generation to map to cache_params:

class SambaForCausalLM(SambaPreTrainedModel, FLAGenerationMixin): @@ def _update_model_kwargs_for_generation( self, outputs: ModelOutput, model_kwargs: Dict[str, Any], **kwargs ) -> Dict[str, Any]: model_kwargs["cache_params"] = outputs.get("cache_params", None) return model_kwargs + + def prepare_inputs_for_generation( + self, + input_ids: torch.LongTensor = None, + cache_params: Optional[MambaCache] = None, + inputs_embeds: Optional[torch.Tensor] = None, + attention_mask: Optional[torch.Tensor] = None, + use_cache: bool = True, + logits_to_keep: Optional[int] = None, + **kwargs: Any, + ) -> Dict[str, Any]: + model_inputs: Dict[str, Any] = {} + # If continuing with cache, only feed the next step token unless using inputs_embeds + if inputs_embeds is not None: + model_inputs["inputs_embeds"] = inputs_embeds + else: + # Keep static stride during decoding + model_inputs["input_ids"] = input_ids.contiguous() if input_ids is not None else None + if logits_to_keep is not None: + model_inputs["logits_to_keep"] = logits_to_keep + model_inputs.update( + { + "cache_params": cache_params, + "use_cache": use_cache, + "attention_mask": attention_mask, + } + ) + return model_inputs

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class SambaForCausalLM(SambaPreTrainedModel, FLAGenerationMixin):

class SambaForCausalLM(SambaPreTrainedModel, FLAGenerationMixin):

def _update_model_kwargs_for_generation(

self, outputs: ModelOutput, model_kwargs: Dict[str, Any], **kwargs

) -> Dict[str, Any]:

model_kwargs["cache_params"] = outputs.get("cache_params", None)

return model_kwargs

def prepare_inputs_for_generation(

self,

input_ids: torch.LongTensor = None,

cache_params: Optional[MambaCache] = None,

inputs_embeds: Optional[torch.Tensor] = None,

attention_mask: Optional[torch.Tensor] = None,

use_cache: bool = True,

logits_to_keep: Optional[int] = None,

**kwargs: Any,

) -> Dict[str, Any]:

model_inputs: Dict[str, Any] = {}

# If continuing with cache, only feed the next step token unless using inputs_embeds

if inputs_embeds is not None:

model_inputs["inputs_embeds"] = inputs_embeds

else:

# Keep static stride during decoding

model_inputs["input_ids"] = input_ids.contiguous() if input_ids is not None else None

if logits_to_keep is not None:

model_inputs["logits_to_keep"] = logits_to_keep

model_inputs.update(

{

"cache_params": cache_params,

"use_cache": use_cache,

"attention_mask": attention_mask,

}

)

return model_inputs

coderabbitai · 2025-09-09T17:18:29Z

fla/models/utils.py

+from transformers.generation import GenerationMixin
+from transformers.utils.deprecation import deprecate_kwarg



🛠️ Refactor suggestion

Harden GenerationMixin import for cross-version compatibility

Some Transformers versions relocate GenerationMixin; a single import path can break (seen intermittently around 4.52–4.56). Add a fallback chain.

-from transformers.generation import GenerationMixin -from transformers.utils.deprecation import deprecate_kwarg +try: + from transformers.generation import GenerationMixin +except Exception: + try: + from transformers.generation.utils import GenerationMixin + except Exception: + # Final fallback for edge versions + from transformers import GenerationMixin +from transformers.utils.deprecation import deprecate_kwarg

🤖 Prompt for AI Agents

In fla/models/utils.py around lines 12 to 14, the direct import of GenerationMixin can break across Transformers versions; replace the single import with a fallback chain that first tries from transformers.generation import GenerationMixin, then from transformers.generation_utils import GenerationMixin, and finally from transformers import GenerationMixin (or other plausible locations) inside try/except blocks to catch ImportError; ensure the final fallback raises a clear ImportError if none succeed and keep the rest of the file using the resolved GenerationMixin name.

coderabbitai · 2025-09-09T17:18:30Z

fla/models/utils.py

+            if inputs_embeds is not None and (cache_position is None or len(cache_position) == inputs_embeds.shape[1]):
+                model_inputs['inputs_embeds'] = inputs_embeds
+                model_inputs['input_ids'] = None
+            else:
+                model_inputs['input_ids'] = input_ids.contiguous() if input_ids is not None else None
+                model_inputs['inputs_embeds'] = None
+
+            model_inputs['cache_position'] = cache_position
+


🛠️ Refactor suggestion

Handle cache_position shape robustly when using inputs_embeds

Comparing len(cache_position) to inputs_embeds.shape[1] breaks if cache_position is 2D. Use last-dim length.

- if inputs_embeds is not None and (cache_position is None or len(cache_position) == inputs_embeds.shape[1]): + cache_len = None + if cache_position is not None: + cache_len = cache_position.shape[-1] + if inputs_embeds is not None and (cache_len is None or cache_len == inputs_embeds.shape[1]): model_inputs['inputs_embeds'] = inputs_embeds model_inputs['input_ids'] = None else: model_inputs['input_ids'] = input_ids.contiguous() if input_ids is not None else None model_inputs['inputs_embeds'] = None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if inputs_embeds is not None and (cache_position is None or len(cache_position) == inputs_embeds.shape[1]):

model_inputs['inputs_embeds'] = inputs_embeds

model_inputs['input_ids'] = None

else:

model_inputs['input_ids'] = input_ids.contiguous() if input_ids is not None else None

model_inputs['inputs_embeds'] = None

model_inputs['cache_position'] = cache_position

cache_len = None

if cache_position is not None:

cache_len = cache_position.shape[-1]

if inputs_embeds is not None and (cache_len is None or cache_len == inputs_embeds.shape[1]):

model_inputs['inputs_embeds'] = inputs_embeds

model_inputs['input_ids'] = None

else:

model_inputs['input_ids'] = input_ids.contiguous() if input_ids is not None else None

model_inputs['inputs_embeds'] = None

model_inputs['cache_position'] = cache_position

🤖 Prompt for AI Agents

In fla/models/utils.py around lines 429 to 437, the code compares len(cache_position) to inputs_embeds.shape[1], which fails if cache_position is a 2D tensor; change the comparison to use the last dimension length (e.g. cache_position.shape[-1]) and guard for None so the condition becomes: if inputs_embeds is not None and (cache_position is None or cache_position.shape[-1] == inputs_embeds.shape[1]); leave the rest of the branch logic unchanged.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fec46c7 and bfc34d3.

📒 Files selected for processing (2)

fla/layers/mamba2.py (1 hunks)
tests/models/test_modeling_mamba2.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

fla/layers/mamba2.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Test H100 (PyTorch 2.7) / test-models
GitHub Check: check-pt-python-compatibility

🔇 Additional comments (1)

* [Deps] Upgrade to transformers 4.56.x * remove triton conv1d backend for mamba2 * skip mom test

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

[Deps] Upgrade to transformers 4.56.x

fec46c7

zhiyuan1i force-pushed the upstream-transformers branch from a982c2e to fec46c7 Compare September 9, 2025 16:59

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

remove triton conv1d backend for mamba2

bfc34d3

coderabbitai bot reviewed Sep 9, 2025

View reviewed changes

skip mom test

4839b6a

zhiyuan1i merged commit 0ca1288 into main Sep 9, 2025
4 of 5 checks passed

zhiyuan1i deleted the upstream-transformers branch September 9, 2025 18:44

zhiyuan1i added a commit that referenced this pull request Sep 11, 2025

[Deps] Upgrade to transformers 4.56.x (#587)

003cf3e

* [Deps] Upgrade to transformers 4.56.x * remove triton conv1d backend for mamba2 * skip mom test

zhiyuan1i added a commit that referenced this pull request Sep 11, 2025

fixup! [Deps] Upgrade to transformers 4.56.x (#587)[skip test]

0d41a87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Deps] Upgrade to transformers 4.56.x #587

[Deps] Upgrade to transformers 4.56.x #587

Uh oh!

zhiyuan1i commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 9, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot Sep 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		class GatedDeltaNetForCausalLM(GatedDeltaNetPreTrainedModel, GenerationMixin):
		class GatedDeltaNetForCausalLM(GatedDeltaNetPreTrainedModel, FLAGenerationMixin):



		class SambaForCausalLM(SambaPreTrainedModel, GenerationMixin):
		class SambaForCausalLM(SambaPreTrainedModel, FLAGenerationMixin):

-class SambaForCausalLM(SambaPreTrainedModel, FLAGenerationMixin):
+ class SambaForCausalLM(SambaPreTrainedModel, FLAGenerationMixin):
+     def _update_model_kwargs_for_generation(
+         self, outputs: ModelOutput, model_kwargs: Dict[str, Any], **kwargs
+     ) -> Dict[str, Any]:
+         model_kwargs["cache_params"] = outputs.get("cache_params", None)
+         return model_kwargs
+     def prepare_inputs_for_generation(
+         self,
+         input_ids: torch.LongTensor = None,
+         cache_params: Optional[MambaCache] = None,
+         inputs_embeds: Optional[torch.Tensor] = None,
+         attention_mask: Optional[torch.Tensor] = None,
+         use_cache: bool = True,
+         logits_to_keep: Optional[int] = None,
+         **kwargs: Any,
+     ) -> Dict[str, Any]:
+         model_inputs: Dict[str, Any] = {}
+         # If continuing with cache, only feed the next step token unless using inputs_embeds
+         if inputs_embeds is not None:
+             model_inputs["inputs_embeds"] = inputs_embeds
+         else:
+             # Keep static stride during decoding
+             model_inputs["input_ids"] = input_ids.contiguous() if input_ids is not None else None
+         if logits_to_keep is not None:
+             model_inputs["logits_to_keep"] = logits_to_keep
+         model_inputs.update(
+             {
+                 "cache_params": cache_params,
+                 "use_cache": use_cache,
+                 "attention_mask": attention_mask,
+             }
+         )
+         return model_inputs

		from transformers.generation import GenerationMixin
		from transformers.utils.deprecation import deprecate_kwarg

Uh oh!

[Deps] Upgrade to transformers 4.56.x #587

[Deps] Upgrade to transformers 4.56.x #587

Uh oh!

Conversation

zhiyuan1i commented Sep 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks (1 passed, 1 warning, 1 inconclusive)

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhiyuan1i commented Sep 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 9, 2025 •

edited

Loading