[RemoveLayoutConversion] Refine cost model

Intel copy of remove layout conversion pass uses the same cost model as the community version. 
```
  // We measure costs in standardised milli-SM-cycles. The smem load
  // and store each cost 8 * convertLayoutBytes, and then we double
  // it to account for extra cost due to synchronisation.
  int64_t convertLayoutCost = 32 * convertLayoutBytes;
```
(1) The cost of smem load/store and synchronization is higher on Intel GPU compared to NV, need to measure.

```
    } else if (isa<arith::ArithDialect, math::MathDialect>(dialect)) {
      // this is an arithmetic operation; we distinguish between cheap
      // operations (such as floating point add/mul which can be fused
      // as halves of a single-cycle FMA instruction) and expensive
      // operations which use the special function unit and/or involve
      // multiple instructions.
      int64_t multiplier = isExpensiveMathOp(op) ? 8 : 1;
      for (Value result : op->getResults()) {
        rematerialisationCost += multiplier * getByteCount(result);
      }
```
(2) It is unclear why rematerialization cost is increased when the number of elements is not increased between the old and new layouts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RemoveLayoutConversion] Refine cost model #5476

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RemoveLayoutConversion] Refine cost model #5476

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions