-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Intel copy of remove layout conversion pass uses the same cost model as the community version.
// We measure costs in standardised milli-SM-cycles. The smem load
// and store each cost 8 * convertLayoutBytes, and then we double
// it to account for extra cost due to synchronisation.
int64_t convertLayoutCost = 32 * convertLayoutBytes;
(1) The cost of smem load/store and synchronization is higher on Intel GPU compared to NV, need to measure.
} else if (isa<arith::ArithDialect, math::MathDialect>(dialect)) {
// this is an arithmetic operation; we distinguish between cheap
// operations (such as floating point add/mul which can be fused
// as halves of a single-cycle FMA instruction) and expensive
// operations which use the special function unit and/or involve
// multiple instructions.
int64_t multiplier = isExpensiveMathOp(op) ? 8 : 1;
for (Value result : op->getResults()) {
rematerialisationCost += multiplier * getByteCount(result);
}
(2) It is unclear why rematerialization cost is increased when the number of elements is not increased between the old and new layouts.