-
Notifications
You must be signed in to change notification settings - Fork 370
feat: second attempt to support DDS and NonZero op #3388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
98aebfd to
8ef1a87
Compare
d55c451 to
ad04cf9
Compare
9a9852f to
d718464
Compare
| if ( | ||
| node != output_node | ||
| and len(node.users) == 0 | ||
| and len(node.all_input_nodes) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to add an assert checking if if has only one input (print the number in the string if it fails)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously reused the code from other lowering pass. it looks like we can directly remove unused ops right?
TensorRT/py/torch_tensorrt/dynamo/lowering/passes/remove_num_users_is_0_nodes.py
Lines 20 to 28 in eed420a
| if ( | |
| node != output_node | |
| and len(node.users) == 0 | |
| and len(node.all_input_nodes) > 0 | |
| ): | |
| gm.graph.erase_node(node) | |
| gm = clean_up_graph_after_modifications(gm) | |
| logger.debug(f"Removed ops that [num_users=0] nodes:\n{gm.graph}") |
do you think if there's any potential issues?
| need_cudagraphs_reset, | ||
| ) = self.runtime_states.set_runtime_states( | ||
| cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed | ||
| self.cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is use_pre_allocated_outputs valid now that you're adding OA feature ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the OA feature will not affact use_pre_allocated_outputs because I didn't change the behavior of CG and use_pre_allocated_outputs has its own context manager as well.
| raise RuntimeError( | ||
| "Both CUDA Graphs and OutputAllocator are enabled. Please disable either one." | ||
| ) | ||
| if self.use_output_allocator_outputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is use_output_allocator_outputs set ? Is it by using the with context manager by the user ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it will be set by the with context manager by the user. If users don't set it, it will choose standard exec or OA according to the converter decorator.
28b27c5 to
7e1a1ca
Compare
peri044
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
narendasan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after minor change
Description
Added a new path to support Data Dependent Shape (DDS) and NonZero op in this PR.
Static and dynamic shapes go the original path; DDS goes the new path with IOutputAllocator.
Fixes #2516
Type of change
Checklist: