Optimized support for the llvm.fshr and llvm.fshl intrinsics would be nice: the Xtensa core has special support for 64->32 bit (SRC instruction) and these intrinsics could make use of that, while currently, 7 instructions (including a branch) are emitted.
These intrinsics are useful e.g. in handling unaligned data.
(Further optimization possibility is that the SAR register is only loaded, when value is different from previously for all shift instructions.)