[Bugfix] Compilation Error in q4f32_1 (mlc-ai#1078)

junrushao · web-flow · commit 3aefd9f9f25d · 2023-10-16T21:16:27.000-07:00
The pass `fuse-split-rotary` assumes the compute dtype is fp16, which
usually is, but in certain cases, e.g. `q0f32` and `q4f32_1`, the
compute is based on fp32 instead. This PR strengthens the check guard.
diff --git a/mlc_llm/core.py b/mlc_llm/core.py
@@ -405,6 +405,7 @@ def mod_transform_before_build(
         hasattr(config, "num_attention_heads")
         and hasattr(config, "hidden_size")
         and hasattr(config, "position_embedding_base")
+        and getattr(config, "dtype", "float16") == "float16"
     ):
         max_seq_len = None
         if args.max_seq_len > 0: