Skip to content

Bad Image Quality generated by Qwen Image #12175

@vagitablebirdcode

Description

@vagitablebirdcode

Describe the bug

I installed the latest version of the diffusers library from the main branch, and used the inference code example in the application for inference, but the effect was very poor, as shown in the following figure:

Image

However, the effects in the official application are as shown in the figure.

Image

Reproduction

Firstly install the newest version of diffusers:
pip install git+https://github.com/huggingface/diffusers

Codes are as follows:

import torch.nn.functional as F

_orig_sdpa = F.scaled_dot_product_attention

def _sdpa_drop_enable_gqa(*args, **kwargs):
    kwargs.pop("enable_gqa", None)
    return _orig_sdpa(*args, **kwargs)

F.scaled_dot_product_attention = _sdpa_drop_enable_gqa


from diffusers import DiffusionPipeline
import torch

model_name = "Qwen/Qwen-Image"
torch_dtype = torch.bfloat16
device = "cuda"
pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
pipe = pipe.to(device)
prompt = "一位身着淡雅水粉色交领襦裙的年轻女子背对镜头而坐,俯身专注地手持毛笔在素白宣纸上书写“通義千問”四个遒劲汉字。古色古香的室内陈设典雅考究,案头错落摆放着青瓷茶盏与鎏金香炉,一缕熏香轻盈升腾;柔和光线洒落肩头,勾勒出她衣裙的柔美质感与专注神情,仿佛凝固了一段宁静温润的旧时光。"

image = pipe(
    prompt=prompt,
    width=1024,
    height=1024,
    num_inference_steps=30,
    true_cfg_scale=4.0,
    guidance_scale=1.0
    generator=torch.Generator(device="cuda").manual_seed(42)
).images[0]
image.save("res.jpg")

Logs

System Info

- 🤗 Diffusers version: 0.35.0.dev0
- Platform: Linux-5.15.0-58-generic-x86_64-with-glibc2.35
- Running on Google Colab?: No
- Python version: 3.10.10
- PyTorch version (GPU?): 2.6.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.34.4
- Transformers version: 4.55.2
- Accelerate version: 1.10.0
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.5.3
- xFormers version: 0.0.22
- Accelerator: NA
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions