InstantStyle is proving to be really good quality and super simple approach - the authors realized certain blocks of the IPAdapter are responsible for composition and style
# target_blocks=["block"] for original IP-Adapter
# target_blocks=["up_blocks.0.attentions.1"] for style blocks only
# target_blocks = ["up_blocks.0.attentions.1", "down_blocks.2.attentions.1"] # for style+layout blocks
ip_model = IPAdapterXL(pipe, image_encoder_path, ip_ckpt, device, target_blocks=["up_blocks.0.attentions.1"])
The same approach (picking specific blocks to control style/composition) is also validated by the B-LoRA paper (they arrived at the same blocks seemingly independentl)
Given this validation, IMO it would make sense to allow such behavior natively in diffusers, as in: allowing users to generally pick target blocks for IPAdapters
(InstantStyle also has the feature of a neg_content_prompt to further disentangle structure and style, but I think that is a nice extra feature, but too specific for universality - the core are the target blocks)