[CI Failure]: buildkite/ci/v1-test

### Name of failing test

tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency

### Basic information

- [ ] Flaky test
- [x] Can reproduce locally
- [ ] Caused by external libraries (e.g. bug in `transformers`)

### 🧪 Describe the failing test

```
python -m pytest -v -s tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency
INFO 06-06 13:08:08 [__init__.py:244] Automatically detected platform cuda.
/home/ubuntu/vllm/.venv/lib/python3.12/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.8, pytest-8.3.5, pluggy-1.5.0 -- /home/ubuntu/vllm/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/vllm
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.24.0, forked-1.6.0, rerunfailures-14.0, mock-3.14.0, shard-0.1.2, buildkite-test-collector-0.1.9
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 1 item                                                                                                                                                                                                   
Running 1 items in this shard: tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency

tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_shared_storage_connector_consistency INFO 06-06 13:08:20 [config.py:822] This model supports multiple tasks: {'embed', 'generate', 'reward', 'classify', 'score'}. Defaulting to 'generate'.
INFO 06-06 13:08:20 [config.py:2182] Chunked prefill is enabled with max_num_batched_tokens=8192.
WARNING 06-06 13:08:20 [cuda.py:91] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 06-06 13:08:20 [core.py:455] Waiting for init message from front-end.
INFO 06-06 13:08:20 [core.py:70] Initializing a V1 LLM engine (v0.8.5.dev601+g05a4324f8) with config: model='meta-llama/Llama-3.2-1B-Instruct', speculative_config=None, tokenizer='meta-llama/Llama-3.2-1B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=meta-llama/Llama-3.2-1B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":false,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
WARNING 06-06 13:08:21 [utils.py:2723] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fbd572d2570>
INFO 06-06 13:08:21 [parallel_state.py:1065] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 06-06 13:08:21 [factory.py:74] Creating v1 connector with name: MultiConnector and engine_id: d8788489-4e2a-4069-a146-5f81b52c3509
WARNING 06-06 13:08:21 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:21 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: 232b73b2-e975-413b-a12c-b056d03432f9
WARNING 06-06 13:08:21 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:21 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='232b73b2-e975-413b-a12c-b056d03432f9', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_1', 'name': 'storage1'}, kv_connector_module_path=None)
INFO 06-06 13:08:21 [shared_storage_connector.py:86] Shared storage path is storage_1
INFO 06-06 13:08:21 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: 5cf558ea-1f57-4093-8e55-7346e74db7b5
WARNING 06-06 13:08:21 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:21 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='5cf558ea-1f57-4093-8e55-7346e74db7b5', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_2', 'name': 'storage2'}, kv_connector_module_path=None)
INFO 06-06 13:08:21 [shared_storage_connector.py:86] Shared storage path is storage_2
WARNING 06-06 13:08:21 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 06-06 13:08:22 [gpu_model_runner.py:1586] Starting to load model meta-llama/Llama-3.2-1B-Instruct...
INFO 06-06 13:08:22 [gpu_model_runner.py:1591] Loading model from scratch...
INFO 06-06 13:08:22 [cuda.py:249] Using Flash Attention backend on V1 engine.
INFO 06-06 13:08:22 [weight_utils.py:292] Using model weights format ['*.safetensors']
INFO 06-06 13:08:22 [weight_utils.py:345] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  3.36it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  3.36it/s]

INFO 06-06 13:08:22 [default_loader.py:272] Loading weights took 0.38 seconds
INFO 06-06 13:08:23 [gpu_model_runner.py:1615] Model loading took 2.3185 GiB and 0.662524 seconds
INFO 06-06 13:08:24 [kv_cache_utils.py:715] GPU KV cache size: 236,048 tokens
INFO 06-06 13:08:24 [kv_cache_utils.py:719] Maximum concurrency for 131,072 tokens per request: 1.80x
INFO 06-06 13:08:24 [core.py:171] init engine (profile, create kv cache, warmup model) took 0.85 seconds
INFO 06-06 13:08:24 [factory.py:74] Creating v1 connector with name: MultiConnector and engine_id: d8788489-4e2a-4069-a146-5f81b52c3509
WARNING 06-06 13:08:24 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:24 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: eb978b1b-8387-4c10-b105-78682747a6c1
WARNING 06-06 13:08:24 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:24 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='eb978b1b-8387-4c10-b105-78682747a6c1', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_1', 'name': 'storage1'}, kv_connector_module_path=None)
INFO 06-06 13:08:24 [shared_storage_connector.py:86] Shared storage path is storage_1
INFO 06-06 13:08:24 [factory.py:74] Creating v1 connector with name: TestSharedStorageConnector and engine_id: e6e9a18f-d74b-49b2-b3ec-254fac9dba68
WARNING 06-06 13:08:24 [base.py:62] Initializing KVConnectorBase_V1. This API is experimental and subject to change in the future as we iterate the design.
INFO 06-06 13:08:24 [shared_storage_connector.py:85] KVTransferConfig(kv_connector='TestSharedStorageConnector', engine_id='e6e9a18f-d74b-49b2-b3ec-254fac9dba68', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={'shared_storage_path': 'storage_2', 'name': 'storage2'}, kv_connector_module_path=None)
INFO 06-06 13:08:24 [shared_storage_connector.py:86] Shared storage path is storage_2
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 438.44it/s]
Processed prompts:   0%|                                                                                                                  | 0/2 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]ERROR 06-06 13:08:24 [core.py:517] EngineCore encountered a fatal error.
ERROR 06-06 13:08:24 [core.py:517] Traceback (most recent call last):
ERROR 06-06 13:08:24 [core.py:517]   File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 508, in run_engine_core
ERROR 06-06 13:08:24 [core.py:517]     engine_core.run_busy_loop()
ERROR 06-06 13:08:24 [core.py:517]   File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 535, in run_busy_loop
ERROR 06-06 13:08:24 [core.py:517]     self._process_engine_step()
ERROR 06-06 13:08:24 [core.py:517]   File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 560, in _process_engine_step
ERROR 06-06 13:08:24 [core.py:517]     outputs, model_executed = self.step_fn()
ERROR 06-06 13:08:24 [core.py:517]                               ^^^^^^^^^^^^^^
ERROR 06-06 13:08:24 [core.py:517]   File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 230, in step
ERROR 06-06 13:08:24 [core.py:517]     scheduler_output = self.scheduler.schedule()
ERROR 06-06 13:08:24 [core.py:517]                        ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-06 13:08:24 [core.py:517]   File "/home/ubuntu/vllm/vllm/v1/core/sched/scheduler.py", line 433, in schedule
ERROR 06-06 13:08:24 [core.py:517]     self.connector.update_state_after_alloc(
ERROR 06-06 13:08:24 [core.py:517]   File "/home/ubuntu/vllm/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 164, in update_state_after_alloc
ERROR 06-06 13:08:24 [core.py:517]     KVCacheBlocks.create_empty(), 0)
ERROR 06-06 13:08:24 [core.py:517]     ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-06 13:08:24 [core.py:517] AttributeError: type object 'KVCacheBlocks' has no attribute 'create_empty'
Process EngineCore_0:
Traceback (most recent call last):
  File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 519, in run_engine_core
    raise e
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 508, in run_engine_core
    engine_core.run_busy_loop()
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 535, in run_busy_loop
    self._process_engine_step()
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 560, in _process_engine_step
    outputs, model_executed = self.step_fn()
                              ^^^^^^^^^^^^^^
  File "/home/ubuntu/vllm/vllm/v1/engine/core.py", line 230, in step
    scheduler_output = self.scheduler.schedule()
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/vllm/vllm/v1/core/sched/scheduler.py", line 433, in schedule
    self.connector.update_state_after_alloc(
  File "/home/ubuntu/vllm/vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py", line 164, in update_state_after_alloc
    KVCacheBlocks.create_empty(), 0)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'KVCacheBlocks' has no attribute 'create_empty'
FAILED

```

### 📝 History of failing test

https://github.com/vllm-project/vllm/pull/17996

### CC List.

@heheda12345  @WoosukKwon @njhill 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI Failure]: buildkite/ci/v1-test #19281

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[CI Failure]: buildkite/ci/v1-test #19281

Description

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions