-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
Versions:
>>> import torch; torch.__version__
'2.7.0+cu126'
>>> import transformers; transformers.__version__
'4.52.2'
>>> import vllm; vllm.__version__
'0.9.1.dev59+gb6a6e7a52'
🐛 Describe the bug
My script is really basic: preparing model input IDs using tokenizers
, constructing vllm.LLM
and then invoking .generate(...)
once
But I'm somehow getting a bunch of different nasty errors and warnings. Are they expected? Is it possible to eliminate them?
Thanks!
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[W523 21:36:39.228028808 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=170, addr=[localhost]:60438, remote=[localhost]:50427): failed to recv, got 0 bytes
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:678 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f4a777785e8 in /mnt/fs/venv_cu126_py312/lib/python3.12/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5ba8afe (0x7f4a6065aafe in /mnt/fs/venv_cu126_py312/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x5baae40 (0x7f4a6065ce40 in /mnt/fs/venv_cu126_py312/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x5bab74a (0x7f4a6065d74a in /mnt/fs/venv_cu126_py312/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #4: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x2a9 (0x7f4a606571a9 in /mnt/fs/venv_cu126_py312/lib/python3.12/site-packages/torch/lib/libtorch_cpu.so)
frame #5: c10d::ProcessGroupNCCL::heartbeatMonitor() + 0x379 (0x7f4a1d8509a9 in /mnt/fs/venv_cu126_py312/lib/python3.12/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0xdc253 (0x7f4b1dda3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #7: <unknown function> + 0x94ac3 (0x7f4b238e1ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: <unknown function> + 0x126850 (0x7f4b23973850 in /lib/x86_64-linux-gnu/libc.so.6)
[W523 21:36:40.236201644 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 3] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: failed to recv, got 0 bytes
[rank6]:[W523 21:36:40.236202281 ProcessGroupNCCL.cpp:1659] [PG ID 0 PG GUID 0 Rank 6] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: failed to recv, got 0 bytes
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working