Feature request
With the addition of flex attention support through #36643, encoder only models still lack this feature.
XLMRoberta, ModernBERT (and EuroBERT in the future) are very common for RAG setups (embedding + reranker).
Allowing them to support arbitrary attention patterns can be useful.
Motivation
Support for arbitrary attention patterns can be useful for research/production.
Your contribution
test