Skip to content

[RemoveLayoutConversion] Refine cost model #5476

@whitneywhtsang

Description

@whitneywhtsang

Intel copy of remove layout conversion pass uses the same cost model as the community version.

  // We measure costs in standardised milli-SM-cycles. The smem load
  // and store each cost 8 * convertLayoutBytes, and then we double
  // it to account for extra cost due to synchronisation.
  int64_t convertLayoutCost = 32 * convertLayoutBytes;

(1) The cost of smem load/store and synchronization is higher on Intel GPU compared to NV, need to measure.

    } else if (isa<arith::ArithDialect, math::MathDialect>(dialect)) {
      // this is an arithmetic operation; we distinguish between cheap
      // operations (such as floating point add/mul which can be fused
      // as halves of a single-cycle FMA instruction) and expensive
      // operations which use the special function unit and/or involve
      // multiple instructions.
      int64_t multiplier = isExpensiveMathOp(op) ? 8 : 1;
      for (Value result : op->getResults()) {
        rematerialisationCost += multiplier * getByteCount(result);
      }

(2) It is unclear why rematerialization cost is increased when the number of elements is not increased between the old and new layouts.

Metadata

Metadata

Assignees

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions