Skip to content

Conversation

@Clement-Wang26
Copy link
Collaborator

details:

  1. model loading uniformly merges tensors in DRAM, followed by manual memory allocation (malloc) on the device and data copying (memcpy) to the allocated memory.
  2. support loading model weights and forward overlap.


c10_npu::NPUStream load_stream_;
std::unique_ptr<ThreadPool> threadpool_;
std::vector<aclrtEvent> events_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aclrtEvent depends on Ascend platform, it needs to abstract event here. you can implement abstract EventInterface class, specific NpuEvent, and EventFactory in core/platform.

@liujinguang0125 liujinguang0125 self-requested a review November 26, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants