-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Closed
Labels
ci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Description
Name of failing test
tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency
Basic information
- Flaky test
- Can reproduce locally
- Caused by external libraries (e.g. bug in
transformers
)
🧪 Describe the failing test
python -m pytest -v -s tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency
INFO 06-06 13:08:08 [__init__.py:244] Automatically detected platform cuda.
/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"
warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.8, pytest-8.3.5, pluggy-1.5.0 -- /home/ubuntu/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.24.0, forked-1.6.0, rerunfailures-14.0, mock-3.14.0, shard-0.1.2, buildkite-test-collector-0.1.9
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 1 item
Running 1 items in this shard: tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency
tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency INFO 06-06 13:08:20 [config.py:822] This model supports multiple tasks: {'embed', 'generate', 'reward', 'classify', 'score'}. Defaulting to 'generate'.
INFO 06-06 13:08:20 [config.py:2182] Chunked prefill is enabled with max_num_batched_tokens=8192.
WARNING 06-06 13:08:20 [cuda.py:91] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 06-06 13:08:20 [core.py:455] Waiting for init message from front-end.
INFO 06-06 13:08:20 [core.py:70] Initializing a V1 LLM engine (v0.8.5.dev601+g05a4324f8) with config: model='meta-llama/Llama-3.2-1B-Instruct', speculative_config=None, tokenizer='meta-llama/Llama-3.2-1B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=meta-llama/Llama-3.2-1B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":false,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
WARNING 06-06 13:08:21 [utils.py:2723] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fbd572d2570>
INFO 06-06 13:08:21 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 06-06 13:08:21 [factory.py:74] Creating v1 connector with name: MultiConnector and engine_id: d8788489-4e2a-4069-a146-5f81b52c3509
WARNING 06-06 13:08:21 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:21 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: 232b73b2-e975-413b-a12c-b056d03432f9
WARNING 06-06 13:08:21 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:21 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='232b73b2-e975-413b-a12c-b056d03432f9', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_1', 'name': 'storage1'}, kv_connector_module_path=None)
INFO 06-06 13:08:21 [shared_storage_connector.py:86] Shared storage path is storage_1
INFO 06-06 13:08:21 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: 5cf558ea-1f57-4093-8e55-7346e74db7b5
WARNING 06-06 13:08:21 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:21 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='5cf558ea-1f57-4093-8e55-7346e74db7b5', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_2', 'name': 'storage2'}, kv_connector_module_path=None)
INFO 06-06 13:08:21 [shared_storage_connector.py:86] Shared storage path is storage_2
WARNING 06-06 13:08:21 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 06-06 13:08:22 [gpu_model_runner.py:1586] Starting to load model meta-llama/Llama-3.2-1B-Instruct...
INFO 06-06 13:08:22 [gpu_model_runner.py:1591] Loading model from scratch...
INFO 06-06 13:08:22 [cuda.py:249] Using Flash Attention backend on V1 engine.
INFO 06-06 13:08:22 [weight_utils.py:292] Using model weights format ['*.safetensors']
INFO 06-06 13:08:22 [weight_utils.py:345] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.36it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 3.36it/s]
INFO 06-06 13:08:22 [default_loader.py:272] Loading weights took 0.38 seconds
INFO 06-06 13:08:23 [gpu_model_runner.py:1615] Model loading took 2.3185 GiB and 0.662524 seconds
INFO 06-06 13:08:24 [kv_cache_utils.py:715] GPU KV cache size: 236,048 tokens
INFO 06-06 13:08:24 [kv_cache_utils.py:719] Maximum concurrency for 131,072 tokens per request: 1.80x
INFO 06-06 13:08:24 [core.py:171] init engine (profile, create kv cache, warmup model) took 0.85 seconds
INFO 06-06 13:08:24 [factory.py:74] Creating v1 connector with name: MultiConnector and engine_id: d8788489-4e2a-4069-a146-5f81b52c3509
WARNING 06-06 13:08:24 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:24 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: eb978b1b-8387-4c10-b105-78682747a6c1
WARNING 06-06 13:08:24 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:24 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='eb978b1b-8387-4c10-b105-78682747a6c1', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_1', 'name': 'storage1'}, kv_connector_module_path=None)
INFO 06-06 13:08:24 [shared_storage_connector.py:86] Shared storage path is storage_1
INFO 06-06 13:08:24 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: e6e9a18f-d74b-49b2-b3ec-254fac9dba68
WARNING 06-06 13:08:24 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:24 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='e6e9a18f-d74b-49b2-b3ec-254fac9dba68', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_2', 'name': 'storage2'}, kv_connector_module_path=None)
INFO 06-06 13:08:24 [shared_storage_connector.py:86] Shared storage path is storage_2
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 438.44it/s]
Processed prompts: 0%| | 0/2 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]ERROR 06-06 13:08:24 [core.py:517] EngineCore encountered a fatal error.
ERROR 06-06 13:08:24 [core.py:517] Traceback (most recent call last):
ERROR 06-06 13:08:24 [core.py:517] File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 508, in run_engine_core
ERROR 06-06 13:08:24 [core.py:517] engine_core.run_busy_loop()
ERROR 06-06 13:08:24 [core.py:517] File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 535, in run_busy_loop
ERROR 06-06 13:08:24 [core.py:517] self._process_engine_step()
ERROR 06-06 13:08:24 [core.py:517] File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 560, in _process_engine_step
ERROR 06-06 13:08:24 [core.py:517] outputs, model_executed = self.step_fn()
ERROR 06-06 13:08:24 [core.py:517] ^^^^^^^^^^^^^^
ERROR 06-06 13:08:24 [core.py:517] File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 230, in step
ERROR 06-06 13:08:24 [core.py:517] scheduler_output = self.scheduler.schedule()
ERROR 06-06 13:08:24 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-06 13:08:24 [core.py:517] File "/home/ubuntu/vllm/vllm/v1/core/sched/scheduler.py", line 433, in schedule
ERROR 06-06 13:08:24 [core.py:517] self.connector.update_state_after_alloc(
ERROR 06-06 13:08:24 [core.py:517] File "/home/ubuntu/vllm/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 164, in update_state_after_alloc
ERROR 06-06 13:08:24 [core.py:517] KVCacheBlocks.create_empty(), 0)
ERROR 06-06 13:08:24 [core.py:517] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-06 13:08:24 [core.py:517] AttributeError: type object 'KVCacheBlocks' has no attribute 'create_empty'
Process EngineCore_0:
Traceback (most recent call last):
File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 519, in run_engine_core
raise e
File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 508, in run_engine_core
engine_core.run_busy_loop()
File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 535, in run_busy_loop
self._process_engine_step()
File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 560, in _process_engine_step
outputs, model_executed = self.step_fn()
^^^^^^^^^^^^^^
File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 230, in step
scheduler_output = self.scheduler.schedule()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/vllm/vllm/v1/core/sched/scheduler.py", line 433, in schedule
self.connector.update_state_after_alloc(
File "/home/ubuntu/vllm/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 164, in update_state_after_alloc
KVCacheBlocks.create_empty(), 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'KVCacheBlocks' has no attribute 'create_empty'
FAILED
📝 History of failing test
CC List.
Metadata
Metadata
Assignees
Labels
ci-failureIssue about an unexpected test failure in CIIssue about an unexpected test failure in CI
Type
Projects
Status
Done