Description
Bugzilla Link | 48110 |
Version | trunk |
OS | All |
Blocks | #31672 |
CC | @adibiagio,@topperc,@RKSimon,@MattPD,@phoebewang,@rotateright |
Extended Description
Hi there,
It looks like llvm-mca is treating icelake-client and icelake-server as having only one shuffle port. This causes incorrect cost calculations for any operation that would use said shuffle ports. A block diagram showing the architecture can be found here: https://en.wikichip.org/wiki/intel/microarchitectures/sunny_cove (sunny cove is the core used by both icelake-server and icelake-client). Similarly, you can find the uops usage for any given shuffle here: https://uops.info/table.html?search=shuf&cb_lat=on&cb_tp=on&cb_uops=on&cb_ports=on&cb_SKL=on&cb_ICL=on&cb_measurements=on&cb_base=on&cb_avx=on
If the code generating the costs for LLVM-MCA is used by anything else in the toolchain, it's likely this will yield a performance benefit for other users targeting icelake-server and icelake-client.