bump version to v0.11.0 (#4155)

lvhan028 · web-flow · commit 4abccafd45da · 2025-12-04T14:19:51.000+08:00
* bump version to v0.11.0

* fix

* fix sm70 sm75 compilation

* update according to comments
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -222,7 +222,7 @@ if(ARCH STREQUAL "x86_64")
   if (NOT CMAKE_CUDA_ARCHITECTURES)
     set(CMAKE_CUDA_ARCHITECTURES "")
     if (${CMAKE_CUDA_COMPILER_VERSION} VERSION_LESS "13.0")
-      list(APPEND CMAKE_CUDA_ARCHITECTURES 70-real 75-real) # V100, 2080
+      list(APPEND CMAKE_CUDA_ARCHITECTURES 70-real 75-real)  # V100, 2080
     endif()
     if (${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL "11")
       list(APPEND CMAKE_CUDA_ARCHITECTURES 80-real) # A100
diff --git a/README.md b/README.md
@@ -214,7 +214,7 @@ The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
 For the GeForce RTX 50 series, please install the LMDeploy prebuilt package complied with **CUDA 12.8**
 
 ```shell
-export LMDEPLOY_VERSION=0.10.2
+export LMDEPLOY_VERSION=0.11.0
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -215,7 +215,7 @@ pip install lmdeploy
 若使用 GeForce RTX 50 系列显卡，请安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。
 
 ```shell
-export LMDEPLOY_VERSION=0.10.2
+export LMDEPLOY_VERSION=0.11.0
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
 ```
diff --git a/docs/en/faq.md b/docs/en/faq.md
@@ -20,8 +20,6 @@ It may have been caused by the following reasons.
 pip install lmdeploy[all]
 ```
 
-If you want to install the nightly build of LMDeploy's whl package, you can download and install it from the latest release at https://github.com/zhyncs/lmdeploy-build according to your CUDA and Python versions. Currently the update frequency of whl is once a day.
-
 2. If you have installed it and still encounter this issue, it is probably because you are executing turbomind-related command in the root directory of lmdeploy source code. Switching to another directory will fix it.
 
 But if you are a developer, you often need to develop and compile locally. The efficiency of installing whl every time is too low. You can specify the path of lib after compilation through symbolic links.
diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md
@@ -23,15 +23,11 @@ pip install lmdeploy
 The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
 
 ```shell
-export LMDEPLOY_VERSION=0.10.2
+export LMDEPLOY_VERSION=0.11.0
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
 
-## Install nightly-build package with pip
-
-The release frequency of LMDeploy is approximately once or twice monthly. If your desired feature has been merged to LMDeploy main branch but hasn't been published yet, you can experiment with the nightly-built package available [here](https://github.com/zhyncs/lmdeploy-build) according to your CUDA and Python versions
-
 ## Install from source
 
 By default, LMDeploy will build with NVIDIA CUDA support, utilizing both the Turbomind and PyTorch backends. Before installing LMDeploy, ensure you have successfully installed the CUDA Toolkit.
@@ -51,7 +47,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git
 If you prefer a specific version instead of the `main` branch of LMDeploy, you can specify it in your command:
 
 ```shell
-pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.2.zip
+pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.11.0.zip
 ```
 
 If you want to build LMDeploy with support for Ascend, Cambricon, or MACA, install LMDeploy with the corresponding `LMDEPLOY_TARGET_DEVICE` environment variable.
diff --git a/docs/zh_cn/faq.md b/docs/zh_cn/faq.md
@@ -20,8 +20,6 @@ pip install --upgrade mmengine
 pip install lmdeploy[all]
 ```
 
-如果您想安装 LMDeploy 预编译包的 nightly 版本，可以根据您的 CUDA 和 Python 版本从 https://github.com/zhyncs/lmdeploy-build 下载并安装最新发布的包。目前更新频率是每天一次。
-
 2. 如果已经安装了，还是出现这个问题，请检查下执行目录。不要在 lmdeploy 的源码根目录下执行 python -m lmdeploy.turbomind.\*下的package，换到其他目录下执行。
 
 但是如果您是开发人员，通常需要在本地进行开发和编译。每次安装 whl 的效率太低了。您可以通过符号链接在编译后指定 lib 的路径。
diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md
@@ -23,15 +23,11 @@ pip install lmdeploy
 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3)，你可以使用以下命令安装 lmdeploy：
 
 ```shell
-export LMDEPLOY_VERSION=0.10.2
+export LMDEPLOY_VERSION=0.11.0
 export PYTHON_VERSION=310
 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
 ```
 
-## 使用 pip 安装夜间构建包
-
-LMDeploy 的发布频率大约是每月一次或两次。如果你所需的功能已经被合并到 LMDeploy 的主分支但还没有发布，你可以环境中的 CUDA 和 Python 版本，尝试使用[这里](https://github.com/zhyncs/lmdeploy-build)提供的夜间构建包。
-
 ## 从源码安装
 
 默认情况下，LMDeploy 将面向 NVIDIA CUDA 环境进行编译安装，并同时启用 Turbomind 和 PyTorch 两种后端引擎。在安装 LMDeploy 之前，请确保已成功安装 CUDA 工具包。
@@ -51,7 +47,7 @@ DISABLE_TURBOMIND=1 pip install git+https://github.com/InternLM/lmdeploy.git
 如果您希望使用特定版本，而不是 LMDeploy 的 `main` 分支，可以在命令行中指定：
 
 ```shell
-pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.10.2.zip
+pip install https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.11.0.zip
 ```
 
 如果您希望构建支持昇腾、寒武纪或沐熙的 LMDeploy，请使用相应的 `LMDEPLOY_TARGET_DEVICE` 环境变量进行安装。
diff --git a/lmdeploy/serve/async_engine.py b/lmdeploy/serve/async_engine.py
@@ -842,7 +842,7 @@ async def generate(
             gen_config.max_new_tokens = max(0, self.session_len - self.id2step[session_id] - len(input_ids))
             if gen_config.max_new_tokens == 0:
                 logger.error(f'run out of tokens. session={session_id}.')
-                yield GenOut(response='run out of tokens',
+                yield GenOut(response='',
                              history_token_len=self.id2step[session_id],
                              input_token_len=len(input_ids),
                              generate_token_len=0,
diff --git a/lmdeploy/version.py b/lmdeploy/version.py
@@ -1,7 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from typing import Tuple
 
-__version__ = '0.10.2'
+__version__ = '0.11.0'
 short_version = __version__