Skip to content

Conversation

tlrmchlsmth
Copy link
Member

@tlrmchlsmth tlrmchlsmth commented Jan 16, 2025

Update Transformers to 4.48. Split off from #10909 to isolate any 4.48-related changes here and for easier debugging.

From @fabianlim, we have the following open issues:

  • Basic Correctness Test: this somehow we are observing memory access errors with gemma probably related to this. (skip and wait for patch fix)?
  • Language Model Test: this happens for a custom model, where in the remote code it imports something from transformers/pytorch_utils.py that is not there anymore (hard to fix since its not public code)
  • Multi GPU Test: this one I cant really repro yet locally as im running into some ray troubles. But my guess its 4.48 related since it passed before I upgrade the transformers, or its an intermittent error (is it possible to retrigger to test?)
  • Quantization Test: this one I reproduced locally, it actually comes from this PR. Its happening on the ckpt neuralmagic/Llama-3.2-1B-quantized.w8a8, where quantization_config.quantization_status=="frozen" . In 4.47, it will need to come to this line to run as compressed, but in 4.48, the new logic will determine that because of the frozen tag, it is not compressed, and will skip to the other line .

Signed-off-by: Tyler Michael Smith <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Jan 16, 2025
@tlrmchlsmth
Copy link
Member Author

FYI @mgoin, @dsikka, @robertgshaw2-redhat on the neuralmagic/Llama-3.2-1B-quantized.w8a8 issue

@dsikka
Copy link
Contributor

dsikka commented Jan 16, 2025

The transformers 4.48 release has the required change in how Automodel is expected to handle compressed vs frozen models. The model in question has the frozen state, which really should be compressed. Likely just outdated

I'll look into updating the model

@fabianlim
Copy link
Contributor

The custom model in question that is causing Language Model Test to fail is this one in particular. What can we do about this since this code is only modifiable but the its maintainers?.

[2025-01-15T23:58:08Z]     from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast, SequenceClassifierOutputWithPast
--
  | [2025-01-15T23:58:08Z]     from transformers.modeling_utils import PreTrainedModel
  | [2025-01-15T23:58:08Z] >   from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS, is_torch_greater_or_equal_than_1_13
  | [2025-01-15T23:58:08Z] E   ImportError: cannot import name 'is_torch_greater_or_equal_than_1_13' from 'transformers.pytorch_utils' (/usr/local/lib/python3.12/dist-packages/transformers/pytorch_utils.py)
  | [2025-01-15T23:58:08Z]
  | [2025-01-15T23:58:08Z] /root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM3-4B/e5715484011d723a1892db91da8b59d979d14aee/modeling_minicpm.py:41: ImportError

@dsikka
Copy link
Contributor

dsikka commented Jan 16, 2025

The custom model in question that is causing Language Model Test to fail is this one in particular. What can we do about this since this code is only modifiable but the its maintainers?.

[2025-01-15T23:58:08Z]     from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast, SequenceClassifierOutputWithPast
--
  | [2025-01-15T23:58:08Z]     from transformers.modeling_utils import PreTrainedModel
  | [2025-01-15T23:58:08Z] >   from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS, is_torch_greater_or_equal_than_1_13
  | [2025-01-15T23:58:08Z] E   ImportError: cannot import name 'is_torch_greater_or_equal_than_1_13' from 'transformers.pytorch_utils' (/usr/local/lib/python3.12/dist-packages/transformers/pytorch_utils.py)
  | [2025-01-15T23:58:08Z]
  | [2025-01-15T23:58:08Z] /root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM3-4B/e5715484011d723a1892db91da8b59d979d14aee/modeling_minicpm.py:41: ImportError

The same issue has appeared in a few other models as a result of the release e.g Deepseek

@tlrmchlsmth
Copy link
Member Author

tlrmchlsmth commented Jan 16, 2025

I put up a PR to fix those ch

The custom model in question that is causing Language Model Test to fail is this one in particular. What can we do about this since this code is only modifiable but the its maintainers?.

[2025-01-15T23:58:08Z]     from transformers.modeling_outputs import BaseModelOutputWithPast, CausalLMOutputWithPast, SequenceClassifierOutputWithPast
--
  | [2025-01-15T23:58:08Z]     from transformers.modeling_utils import PreTrainedModel
  | [2025-01-15T23:58:08Z] >   from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS, is_torch_greater_or_equal_than_1_13
  | [2025-01-15T23:58:08Z] E   ImportError: cannot import name 'is_torch_greater_or_equal_than_1_13' from 'transformers.pytorch_utils' (/usr/local/lib/python3.12/dist-packages/transformers/pytorch_utils.py)
  | [2025-01-15T23:58:08Z]
  | [2025-01-15T23:58:08Z] /root/.cache/huggingface/modules/transformers_modules/openbmb/MiniCPM3-4B/e5715484011d723a1892db91da8b59d979d14aee/modeling_minicpm.py:41: ImportError

The same issue has appeared in a few other models as a result of the release e.g Deepseek

I have a PR up to fix it just for MiniCMP here: https://huggingface.co/openbmb/MiniCPM3-4B/discussions/39

But alternatively if there are multiple models with the same issue, I think it would be better to fix via huggingface/transformers#35734 -- @dsikka do you have links to other models with the same issue?

@dsikka
Copy link
Contributor

dsikka commented Jan 16, 2025

FYI the model has been updated and the tests now pass (at least locally)
@tlrmchlsmth @fabianlim

@ani300
Copy link

ani300 commented Jan 17, 2025

The basic correctness tests pass for me if using this: huggingface/transformers#35681. For now we can wait until it gets merged, or skip the tests until the new point release of transformers

@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 18, 2025
@tlrmchlsmth
Copy link
Member Author

Added the ready label to see if the Multi GPU test is intermittent

@tlrmchlsmth
Copy link
Member Author

Looks like the multi gpu tests are passing so we should be good to go once huggingface/transformers#35681 lands. I'm inclined to wait for a transformers point release but lmk if you disagree

@fabianlim
Copy link
Contributor

@tlrmchlsmth thanks! Im absolutely on board with waiting for the point release

@ani300
Copy link

ani300 commented Jan 28, 2025

huggingface/transformers#35681 is now merged, so that's one less error as soon as they make a point release after it!

@noldorj
Copy link

noldorj commented Jan 29, 2025

It still happens in
Torch version: 2.5.1+cu124
Transformers version: 4.48.1

@ArthurZucker
Copy link
Contributor

Patch release 4.48.2 is out btw !

@fabianlim
Copy link
Contributor

fabianlim commented Jan 30, 2025

@tlrmchlsmth seems like few tests failed, its wierd because some of these tests (e.g., Open AI, TPU test) passed before when I first tried to perform the upgrade to 4.48.0 in my other PR #10909. Wonder if some of the are intermittent

@tlrmchlsmth
Copy link
Member Author

tlrmchlsmth commented Jan 31, 2025

The CI for TPUs is broken due to disks being full. Not sure about the others - I'm rerunning the failed tests

@DarkLight1337
Copy link
Member

Closing as superseded by #12599

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants