Skip to content

[RFC]: Reduce Unit Test to Speed Up CI #22041

@yewentao256

Description

@yewentao256

Motivation.

#21764

Currently CI spent a lot of time in kernel unit test, even longer than 3 hours, but you know they are just kernels. So we aim to simplify some unit tests to make CI faster.

Proposed Change.

1. Reduce Test Cases

For example, we can reduce the amount of shapes

MNK_FACTORS = [
    (1, 128, 128),
    (1, 512, 512),
    (1, 128, 7168),
    (1, 1024, 7168),
    (1, 4608, 128),
    (1, 4608, 512),
    (1, 4608, 7168),
    (83, 128, 128),
    (83, 512, 512),
    (83, 1024, 7168),
    (83, 4608, 512),
    (83, 4608, 7168),
    (128, 128, 128),
    (128, 512, 512),
    (128, 1024, 7168),
    (128, 4608, 512),
    (128, 4608, 7168),
    (2048, 128, 128),
    (2048, 1024, 7168),
    (2048, 4608, 512),
    (2048, 4608, 7168),
    (8192, 128, 128),
    (8192, 512, 512),
    (8192, 128, 7168),
    (8192, 1024, 7168),
    (8192, 4608, 512),
    (8192, 4608, 7168),
]
->
MNK_FACTORS = [
    (1, 128, 128),
    (1, 1024, 7168),
    (1, 4608, 128),
    (83, 512, 512),
    (128, 128, 128),
    (128, 1024, 7168),
    (2048, 1024, 7168),
    (2048, 4608, 512),
    (8192, 512, 512),
    (8192, 4608, 512),
    (8192, 4608, 7168),
]

2. Refactor test type

wentao@gpu66:~/vllm-source/tests/kernels/moe$ ls
__init__.py                      test_modular_kernel_combinations.py
__pycache__                      test_moe.py
modular_kernel_tools             test_moe_align_block_size.py
parallel_utils.py                test_moe_permute_unpermute.py
test_batched_moe.py              test_mxfp4_moe.py
test_block_fp8.py                test_nvfp4_moe.py
test_block_int8.py               test_pplx_cutlass_moe.py
test_count_expert_num_tokens.py  test_pplx_moe.py
test_cutlass_grouped_gemm.py     test_rocm_aiter_topk.py
test_cutlass_moe.py              test_silu_mul_fp8_quant_deep_gemm.py
test_deepep_deepgemm_moe.py      test_triton_moe_ptpc_fp8.py
test_deepep_moe.py               utils.py
test_deepgemm.py

Some communication kernels like deepep & pplx is time consuming and should be moved to other test folder. And only be triggered if necessary

3. Remove Test

Possibly remove some useless test entirely.

Feedback Period.

No response

CC List.

@mgoin

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions