Skip to content

Remove TORCH_API from inline at::internal::lazy_init_num_thread #89511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ankurvdev
Copy link
Contributor

@ankurvdev ankurvdev commented Nov 22, 2022

The function signature in its current state is ambiguous.
Its an inline function that is also declared to be imported from the DLL.
which leaves it subject to compilers decision to choose one or the other and depending on what the compiler/linker may choose we may get one of the two behaviors for the aten::init_num_threads call:

  1. Once-per-dll-in-a-thread (if its inlined)
  2. Once-per-thread (if its imported)

I suspect once-per-dll-in-a-thread is already the case currently because it being tagged inline
So removing the inline will simply make it a little more consistent and clear.

The function exists to avoid repeated calls to aten::init_num_threads.
Being in an "internal" namespace, the function isnt expected to be called by external plugins which means that the "once-per-dll-in-a-thread" behavior isn't that much of a problem anyway

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 22, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89511

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit 7ae1144:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 22, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ankurvdev / name: Ankur Verma (a7822db)

@BowenBao
Copy link
Collaborator

@malfet @kit1980 Please take a look.

@kit1980 kit1980 requested a review from Blackhex November 22, 2022 22:11
@Blackhex
Copy link
Collaborator

Note that this VS 17.4.0 regression is being tracked as thread_local causing fatal error LNK1161: invalid export specification on VS 2022 - Visual Studio Feedback

@@ -29,7 +29,7 @@ TORCH_API bool in_parallel_region();
namespace internal {

// Initialise num_threads lazily at first parallel call
inline TORCH_API void lazy_init_num_threads() {
inline void lazy_init_num_threads() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change seems OK in a sense that it makes the code compile. Albeit I am not so sure if it won't break the DLL interface when the __declspec(dllexport) would be missing. According to my search through the codebase, it does not seem that this function is actually used accross the DLL interface but my knowledge here is limited.

cc @kit1980

@kit1980 kit1980 added ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Nov 22, 2022
@kit1980
Copy link
Contributor

kit1980 commented Nov 22, 2022

Note that this VS 17.4.0 regression is being tracked as thread_local causing fatal error LNK1161: invalid export specification on VS 2022 - Visual Studio Feedback

@ankurvdev Is it possible to make the change affect only that specific compiler version? Otherwise the change seems to be too broad as it will affect every compiler on every platform.

@ankurvdev
Copy link
Contributor Author

Thanks for providing the reference to the bug.
IMO it's probably not worth wrapping this in a compiler version specific change.
If the VS team is ready to roll out a fix for this, then I'm happy to wait it out.

However, that said, IMO the semantics this piece of code tries to express (inlining as well as importing) are extremely weird.
And that should probably be changed regardless.

The code change here would simply imply that aten::init_num_threads is called once-per-dll-in-a-thread (that call the lazy function) as opposed to once-per-thread.
Given that plugin dlls rarely call this directly .... thats probably never going to happen and even if it does, I suspect multiple calls to init_num_threads wont be a huge performance hit (why is it exported anyway?)

Even so ... if there are concerns
I can move this function into a cpp file so we can reliably switch to once-per-thread.

The semantics in its current state make it unclear which mode we'll get

  1. Once-per-dll-in-a-thread
  2. Once-per-thread
    And depending on the compiler version it could be either.
    I suspect once-per-dll-in-a-thread is already the case currently because it being tagged inline.

So choosing one way or the other should be done regardless of the VS bugfix.

Thoughts ?

@malfet
Copy link
Contributor

malfet commented Nov 23, 2022

lazy_init_num_thread was introduced in #37461 for the purpose of reducing number of calls to the heavier at::init_num_threads function, but on the other hand, if its called repeatedly is not an end of the world and just a minor performance overhead.

With that in mind, removing TORCH_API decorator sounds reasonable to me, but please change PR title to something like: "Remove TORCH_API" from inline at::internal::lazy_init_num_thread`" and in PR description explain, that behavior of inline functions with external linkage is not defined across shared library boundary and very likely does not result in a singleton.

Imo a better change , would be to move thread_local bool init to the namespace level and rename it to something like extern TORCH_API thread_local bool _num_thread_initialized and reference it from the inline function. This way it will have an unambiguous call-once-per-thread semantic across all shared libraries.

@ankurvdev ankurvdev changed the title VS2022 17.4 Compilation Failures for Torch extensions Remove TORCH_API from inline at::internal::lazy_init_num_thread Nov 23, 2022
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 28, 2022
@zou3519 zou3519 requested a review from malfet November 28, 2022 14:41
@NiklasGustafsson
Copy link

FYI - a MS C++ bug fix is underway, but it may not be included until version 17.4.4.

@ankurvdev
Copy link
Contributor Author

Ping ...
I changed the title and description as requested.
Is there any other change needed before this can merge. ?

@malfet
Copy link
Contributor

malfet commented Dec 6, 2022

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

Do not try to export inline apis as that causes linking issues
@pytorchmergebot
Copy link
Collaborator

Successfully rebased vs2022_17_4_compilation_fix onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout vs2022_17_4_compilation_fix && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the vs2022_17_4_compilation_fix branch from a7822db to 7ae1144 Compare December 6, 2022 18:32
@malfet
Copy link
Contributor

malfet commented Dec 8, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
…rch#89511)

The function signature in its current state is ambiguous.
Its an inline function that is also declared to be imported from the DLL.
which leaves it subject to compilers decision to choose one or the other and depending on what the compiler/linker may choose we may get one of the two behaviors for the `aten::init_num_threads` call:

1. Once-per-dll-in-a-thread (if its inlined)
2. Once-per-thread (if its imported)

I suspect once-per-dll-in-a-thread is already the case currently because it being tagged inline
So removing the inline will simply make it a little more consistent and clear.

The function exists to avoid repeated calls to aten::init_num_threads.
Being in an "internal" namespace, the function isnt expected to be called by external plugins which means that the "once-per-dll-in-a-thread" behavior isn't that much of a problem anyway

Pull Request resolved: pytorch#89511
Approved by: https://github.com/malfet
atalman pushed a commit to pytorch/test-infra that referenced this pull request Jun 8, 2023
This adds VS 2022 to Windows AMI. The line with
[https://aka.ms/](https://aka.ms/) link might be reverted once
`vs16.11.21_BuildTools.exe` and `vs17.4.1_BuildTools.exe` files will be
uploaded to
[https://s3.amazonaws.com/ossci-windows/](https://s3.amazonaws.com/ossci-windows/.).

Note that there is a fix of [thread\_local causing fatal error LNK1161:
invalid export specification on VS 2022 - Visual Studio
Feedback](https://developercommunity.visualstudio.com/t/thread_local-causing-fatal-error-LNK1161/10199441)
pending, required e.g. for pytorch/pytorch#89511, which will require
another VS 2022 version update. Alternatively, the VS 2022 version
installed by this change can be downgraded to 16.3.6 which does not
suffer this issue.

This is needed to finish pytorch/pytorch#86591
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants