Skip to content

Conversation

@hlky
Copy link
Contributor

@hlky hlky commented Oct 14, 2024

What does this PR do?

This PR adds support for Florence-2.

Compared to the existing remote code the main difference is removal of MySequential, removal of einops.rearrange and trunc_normal_ and DropPath are copied from timm.

Fixes #34155

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@hlky hlky force-pushed the add-florence2 branch 2 times, most recently from 033979d to af0fd69 Compare October 14, 2024 16:10
@LysandreJik
Copy link
Member

Thanks! @Rocketknight1 would you be down to do a first review?

@Rocketknight1
Copy link
Member

Sure! Taking it today.

Copy link
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I just finished a partial review of this PR. However, I've realized that probably over 50% of the modelling code is copied from BART. We actually have a brand new method for handling cases like this, to both simplify the library and the PR! That method is Modular transformers.

If you want to try this in modular transformers style, you can replace the modeling_florence2.py file with modular_florence2.py. In this file, you can import modules from other classes in transformers, like BartEncoderLayer, and then simply define your layers like this, without any Copied from statements:

class Florence2EncoderLayer(BartEncoderLayer):
    pass

Thus, the only layers the modular_florence2.py file needs to contain are the ones that are unique, and everything else can be imported. You can see an example of this in action in this PR. If you do it this way, then the modeling_florence2.py will be autogenerated for you, and we don't need to review it in detail. You can leave the tokenization/processing files as-is for now.

Would you be willing to refactor this PR to modular style? We'd be happy to help, and it should make things much simpler.

Some other comments below as well, but you can ignore the copied from comments if you refactor to modular style - except as a guide for which classes to inherit from in your modular file!

Comment on lines 82 to 89
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists as default arguments are highly dangerous, because they're mutable, and the same mutable list will be shared by the init method and any objects of this class. For example:

config = Florence2VisionConfig()
config.patch_size[3] = 4  # This actually mutates the same list object held by the init method!
new_config = Florence2VisionConfig()  # The new config will inherit the mutated patch size!
new_config.patch_size[3] = 5  # This will affect both the init and the first config object!

I recommend either:

  1. Replace default list args with default tuples, which are immutable, and then convert them to list() in the body of the method. This will create a new list each time, so there will be no shared list to be mutated.
  2. Replace default list args with None, and then in the body of the method, check for a None value and create a list with the default value if it's present. Again, this ensures there's no shared list to be mutated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a list as a default argument - it's less likely to be mutated, but still kind of dangerous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.vocab_size = self.vocab_size

Redundant line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer to avoid asserts in favour of errors, like if ...: raise ValueError()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More asserts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from BART

@hlky
Copy link
Contributor Author

hlky commented Oct 15, 2024

Thanks for the review, I'll rework this using modular.

@hlky hlky marked this pull request as draft October 16, 2024 17:03
@hlky
Copy link
Contributor Author

hlky commented Oct 16, 2024

Traceback (most recent call last):
  File "utils/check_modular_conversion.py", line 73, in <module>
    non_matching_files += compare_files(modular_file_path, args.fix_and_overwrite)
  File "utils/check_modular_conversion.py", line 53, in compare_files
    generated_modeling_content = convert_modular_file(modular_file_path)
  File "/home/user/transformers/utils/modular_model_converter.py", line 1141, in convert_modular_file
    wrapper.visit(cst_transformers)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/metadata/wrapper.py", line 204, in visit
    return self.module.visit(visitor)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/_nodes/module.py", line 89, in visit
    result = super(Module, self).visit(visitor)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/_nodes/base.py", line 236, in visit
    leave_result = visitor.on_leave(self, with_updated_children)
  File "/home/user/transformers/.venv/lib/python3.8/site-packages/libcst/_visitors.py", line 71, in on_leave
    updated_node = leave_func(original_node, updated_node)
  File "/home/user/transformers/utils/modular_model_converter.py", line 1115, in leave_Module
    self._recursively_add_all_new_needed_functions_in_files()
  File "/home/user/transformers/utils/modular_model_converter.py", line 1101, in _recursively_add_all_new_needed_functions_in_files
    dependency, body, self.all_definitions[dependency], parent=parent
KeyError: 'ValueError'

Seems to be some issue with modular conversion.

@Rocketknight1
Copy link
Member

@hlky we think we know what's happening there. There is a fix open for it, give us a little bit!

@Rocketknight1
Copy link
Member

@hlky modular conversions should be working now - I pulled in changes from another branch! Please let me know if you encounter any other problems.

To use the updated converter, pull the latest changes and then run python utils/modular_model_converter.py --files_to_parse src/transformers/models/florence2/modular_florence2.py to generate the modelling_florence2.py file.

Comment on lines 868 to 869
Copy link
Contributor Author

@hlky hlky Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modular converter generates Florence2PreTrainedModel here instead of Florence2LanguagePreTrainedModel. I'll look into a fix, but given the similarity to Bart I'm thinking we can just use Bart directly in Florence2ForConditionalGeneration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep otherwise I can checkout the PR and push the fix! 🤗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we implement DaViT separately to follow the Transformers philosophy of "one model per file"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally lean towards yes, but I'm not sure how that interacts with modular transformers - cc @ArthurZucker

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kinda up to you, if you think DaVit will be used alone, maybe, but otherwise we can have FlorenceVit !

@ducviet00
Copy link
Contributor

Hi @hlky, could you provide an update on the status of this PR?
Since timm is compatible with transformers, would it be okay for me to continue integrating Florence-2 into transformers based on your work?

@hlky
Copy link
Contributor Author

hlky commented Jan 14, 2025

would it be okay for me to continue integrating Florence-2

Hi @ducviet00. That would be great, thank you! 🤗

@EFord36
Copy link

EFord36 commented Feb 4, 2025

Hi @hlky, could you provide an update on the status of this PR? Since timm is compatible with transformers, would it be okay for me to continue integrating Florence-2 into transformers based on your work?

Hi, how are you getting on with this?

No problem if there's no progress, just thought I'd check if you could use any help as I'm using Florence-2 for a project and support would be very helpful for me!

@ducviet00
Copy link
Contributor

@EFord36
I have a draft version that works well and is compatible with torch.compile.
However, it depends on #35314.
I'll make a PR soon

@ArthurZucker
Copy link
Collaborator

In the mean time will review the PR to make sure it's ready! 🚀

@hlky hlky closed this Apr 15, 2025
@hlky hlky deleted the add-florence2 branch April 15, 2025 12:30
@atbe
Copy link

atbe commented May 12, 2025

Hi @hlky why was this PR closed? I noticed that the Florence2ForConditionalGeneration architecture support still hasn't made it into the latest transformers version and attempting to run the model with transformers results in the error

[VllmSidecar]: 2025-05-12T23:27:57.368Z 1 stderr: async with build_async_engine_client(args) as engine_client: {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.368Z 1 stderr: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__ {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.368Z 1 stderr: return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 259, in build_async_engine_client_from_engine_args
    mq_engine_client = await asyncio.get_running_loop().run_in_executor(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 59, in run {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: result = self.fn(*self.args, **self.kwargs) {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/client.py", line 101, in __init__ {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: self.tokenizer = init_tokenizer_from_configs(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 101, in init_tokenizer_from_configs {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: return TokenizerGroup(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer_group.py", line 23, in __init__
    self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config) {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 242, in get_tokenizer {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/tokenizer.py", line 221, in get_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 973, in from_pretrained {
  source: "vllm-sidecar",
}
[VllmSidecar]: 2025-05-12T23:27:57.369Z 1 stderr: raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-large.00d2f1570b00c6dea5df998f5635db96840436bc.configuration_florence2.Florence2Config'> to build an AutoTokenizer.
Model type should be one of AlbertConfig, AlignConfig, AriaConfig, BarkConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, BloomConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChameleonConfig, ChineseCLIPConfig, ClapConfig, CLIPConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, ColPaliConfig, ConvBertConfig, CpmAntConfig, CTRLConfig, Data2VecAudioConfig, Data2VecTextConfig, DbrxConfig, DebertaConfig, DebertaV2Config, DiffLlamaConfig, DistilBertConfig, DPRConfig, ElectraConfig, Emu3Config, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FalconMambaConfig, FastSpeech2ConformerConfig, FlaubertConfig, FNetConfig, FSMTConfig, FunnelConfig, GemmaConfig, Gemma2Config, GitConfig, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GPTSanJapaneseConfig, GroundingDinoConfig, GroupViTConfig, HeliumConfig, HubertConfig, IBertConfig, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, InstructBlipVideoConfig, JambaConfig, JetMoeConfig, JukeboxConfig, Kosmos2Config, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LlamaConfig, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MgpstrConfig, MistralConfig, MixtralConfig, MllamaConfig, MobileBertConfig, ModernBertConfig, MoonshineConfig, MoshiConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OmDetTurboConfig, OneFormerConfig, OpenAIGPTConfig, OPTConfig, Owlv2Config, OwlViTConfig, PaliGemmaConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, PersimmonConfig, PhiConfig, Phi3Config, PhimoeConfig, Pix2StructConfig, PixtralVisionConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2_5_VLConfig, Qwen2AudioConfig, Qwen2MoeConfig, Qwen2VLConfig, RagConfig, RealmConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, SiglipConfig, Speech2TextConfig, Speech2Text2Config, SpeechT5Config, SplinterConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, SwitchTransformersConfig, T5Config, TapasConfig, TransfoXLConfig, TvpConfig, UdopConfig, UMT5Config, VideoLlavaConfig, ViltConfig, VipLlavaConfig, VisualBertConfig, VitsConfig, Wav2Vec2Config, Wav2Vec2BertConfig, Wav2Vec2ConformerConfig, WhisperConfig, XCLIPConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig, ZambaConfig, Zamba2Config. {
  source: "vllm-sidecar",
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Florence-2

8 participants