https://github.com/huggingface/diffusers/blob/e607a582cfaa7dfaf7913fc3bb54c35eceee583c/src/diffusers/models/transformer_2d.py#L228-L230 The shape is `(batch_size, seq_length, embedding_dim)`, isn't it? Also it's supposed to be float tensor. Or maybe it supports both long and float?