Fix draft_top_p fallback: 1.0 → 0.95 (enables nucleus)

yuz207 · yuz207 · commit a530c97a7033 · 2025-09-27T17:41:38.000-07:00
CRITICAL: Line 261 had TWO 1.0 fallbacks, disabling nucleus even when
config default is 0.95.

Before: getattr(..., 1.0) or 1.0 → always 1.0 → nucleus disabled
After:  getattr(..., 0.95) or 0.95 → 0.95 → nucleus enabled

This is why survivors=32000 (full vocab) instead of ~hundreds.
diff --git a/vllm/v1/spec_decode/eagle.py b/vllm/v1/spec_decode/eagle.py
@@ -258,7 +258,7 @@ def _sample_draft_tokens(
                 x = masked.scatter(-1, topi, topv)
 
             # --- top-p (nucleus) ---
-            tp = float(getattr(self.opt_config, "draft_top_p", 1.0) or 1.0)
+            tp = float(getattr(self.opt_config, "draft_top_p", 0.95) or 0.95)
 
             if 0.0 < tp < 1.0:
                 p = torch.softmax(x, dim=-1)