You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds support for Open Compute Project (OCP) floating
point Microscaling formats (MX).
Provide cast and matrix multiply operators that work
with the microscaling formats.
CONST supports constants of the MXFP data types.
CAST supports casting the MXFP data types to and from bf16
Co-Authored-By: Eric Kunze <[email protected]>
Signed-off-by: Dominic Symes <[email protected]>
Signed-off-by: Eric Kunze <[email protected]>
Change-Id: Ifb05503937f3d5c74cebe106156c60bff9af21dc
Copy file name to clipboardExpand all lines: chapters/introduction.adoc
+48-6Lines changed: 48 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -271,9 +271,33 @@ Number formats not required for any operators in a profile do not need to be imp
271
271
| (1<<47)-1
272
272
|Signed 48-bit two's-complement value.
273
273
274
+
|fp4e2m1_t
275
+
| -6.0
276
+
| +6.0
277
+
| 4-bit floating-point defined by <<OCP-MX,OCP-MX>> with two bits of exponent and one bit of mantissa. +
278
+
Normal values must be supported. +
279
+
Subnormal values must be supported. +
280
+
Signed zero must be supported.
281
+
282
+
|fp6e3m2_t
283
+
| -28.0
284
+
| +28.0
285
+
| 6-bit floating-point defined by <<OCP-MX,OCP-MX>> with three bits of exponent and two bits of mantissa. +
286
+
Normal values must be supported. +
287
+
Subnormal values must be supported. +
288
+
Signed zero must be supported.
289
+
290
+
|fp6e2m3_t
291
+
| -7.5
292
+
| +7.5
293
+
| 6-bit floating-point defined by <<OCP-MX,OCP-MX>> with two bits of exponent and three bits of mantissa. +
294
+
Normal values must be supported. +
295
+
Subnormal values must be supported. +
296
+
Signed zero must be supported.
297
+
274
298
|fp8e4m3_t
275
299
| -448
276
-
| 448
300
+
| +448
277
301
| 8-bit floating-point defined by <<OCP-OFP8,OCP-OFP8>> with four bits of exponent and three bits of mantissa. +
278
302
Normal values must be supported. +
279
303
Subnormal values must be supported. +
@@ -292,6 +316,12 @@ Positive and negative infinity must be supported. +
292
316
NaN encodings must be supported. +
293
317
Signed zero must be supported.
294
318
319
+
|fp8ue8m0_t
320
+
| exp2(-127)
321
+
| exp2(+127)
322
+
| 8-bit floating-point value defined by <<OCP-MX,OCP-MX>> with no sign bit, eight bits of exponent, and no mantissa bits. +
323
+
The NaN encoding must be supported. +
324
+
295
325
|fp16_t
296
326
| -infinity
297
327
| +infinity
@@ -331,6 +361,11 @@ Subnormal values must either be supported or flushed to sign-preserved zero. +
331
361
Positive and negative infinity must be supported. +
332
362
At least one NaN encoding must be supported. +
333
363
Signed zero must be supported.
364
+
365
+
|mxint8_t
366
+
| -2
367
+
| +1 + 63/64
368
+
| 8-bit integer format with an implicit 1/64 scale defined by <<OCP-MX,OCP-MX>>. +
334
369
|===
335
370
336
371
Note: In this specification, minimum<type> and maximum<type> will denote the minimum and maximum values of the data as stored in memory (ignoring the zero point).
@@ -450,15 +485,21 @@ This section assumes an operation acting on tensors named 'input', 'weight' and
450
485
Each output tensor element can be expressed as a dot product of elements between the 'input' and 'weight' tensors with optional bias addition.
451
486
The dot product has length KS, the kernel size.
452
487
If the operation does not specify a bias then 'bias' is taken to be zero in this section.
488
+
If the dot product is of a block-scaled tensor, then 'input_scale' and 'weight_scale' are inputs to the dot product.
489
+
453
490
Note: KS is defined for each relevant operator in the appendix section <<Floating-Point Operator Test Data>>.
454
491
455
-
In other words, each output element `out` can be expressed as a dot product between input elements `in[k]`, weight elements `w[k]`, bias `b`:
492
+
Each output element `out` can be expressed as a dot product between input elements `in[k]`, weight elements `w[k]`, bias `b`:
Performs two dimensional matrix multiplications using block scaled tensors.
193
+
The block dimension is always the the last dimension of the tensor, so the result is effectively a matrix multiply of A by the transposed B matrix.
194
+
If the D dimension of input B is of size 1, the B matrix will be broadcast.
195
+
196
+
*Precision Requirements*
197
+
198
+
* Each output can be expressed as a dot product of two input vectors multiplied by the scale factors for the A and B tensors.
199
+
* The dot product must meet the <<Dot product accuracy requirements>>.
200
+
* When generating the data sets for the Dot product accuracy requirements, the data should be generated as fp32 and converted to a scale/value tensor pair using the scale calculation defined in CAST_TO_BLOCK_SCALED.
0 commit comments