Skip to content

Conversation

zewenli98
Copy link
Collaborator

Description

Fixes #2774 #2726

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@zewenli98 zewenli98 requested a review from peri044 May 7, 2024 00:38
@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels May 7, 2024
@zewenli98 zewenli98 requested a review from chohk88 May 7, 2024 00:38
@zewenli98 zewenli98 self-assigned this May 7, 2024
@chohk88
Copy link
Collaborator

chohk88 commented May 7, 2024

This looks good to me. It seems like it could resolve the CI/CD issues mentioned in 2726

@chohk88
Copy link
Collaborator

chohk88 commented May 7, 2024

It seems that we've encountered an accuracy issue in the CI/CD test, which wasn't quite what we expected. Could this be the error you mentioned occurring in TRT-10?

@github-actions github-actions bot added the component: tests Issues re: Tests label May 7, 2024
@zewenli98
Copy link
Collaborator Author

Could this be the error you mentioned occurring in TRT-10?

@chohk88 Yes that's it! The issue here is that the original broadcast() function in fx/conterver_utils.py only broadcasts two ITensors to the same number of dimensions (ranks) by prepending 1s, which results in the error in the following ctx.net.add_elementwise(). For fixing this, I wrote another similar broadcast function, called broadcast_to_same_shape that can broadcast two ITensors to the exactly same shape.

For example, we have original ITensors: lhs_val.shape: (2, 3) rhs_val.shape: (2, 2, 1, 3)

  • If calling fx/converter_utils.broadcast, lhs_val.shape: (1, 1, 2, 3) lhs_val.shape: (2, 2, 1, 3).
  • If calling broadcast_to_same_shape, lhs_val.shape: (2, 2, 2, 3) lhs_val.shape: (2, 2, 2, 3).

I think this fix should also work for #2726

@chohk88
Copy link
Collaborator

chohk88 commented May 8, 2024

This looks good to me.

@zewenli98 zewenli98 merged commit cd61e54 into main May 8, 2024
@zewenli98 zewenli98 deleted the fix_elementwise_base branch May 8, 2024 21:43
@zewenli98
Copy link
Collaborator Author

@peri044 Is this PR needed to be cherry-picked to release/2.3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants