You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[BugFix] Lazily import XgrammarBackend to avoid early cuda init
Importing xgrammar appears to initialize the cuda context, which we don't want to do in the front-end process. It also means that the server can't be started with the (default) multiproc context mode of fork.
I guess this is what LazyLoader is meant to help with, but it doesn't seem to be working as intended since vllm-project#14694 was merged.
Signed-off-by: Nick Hill <[email protected]>
0 commit comments