Skip to content

Conversation

willyd
Copy link

@willyd willyd commented Dec 2, 2020

  • Fix debug build by ignoring the python debug library. An option IGNORE_PYTHON_DEBUG_LIBRARY is introduced for that.
  • Fix symbol registration issues with redefinition of cuda_version and linker optimizations in Release build.

Fixes #3086

* Fix debug build by ignoring the python debug library. An option IGNORE_PYTHON_DEBUG_LIBRARY is introduced for that.
* Fix symbol registration issues with redefinition of cuda_version and linker optimizations in Release build.
@willyd
Copy link
Author

willyd commented Dec 2, 2020

Looks like the Python build is not happening like it is supposed to on Windows. I found the culprit, the models/*.cpp files are not included in the .pyd extension. I did not make any change to this file. I suspect that the change to the vision header broke this.

How can we reconcile this? Shouldn't the models.cpp be included in the .pyd extension?

This can be fixed with the following change, but I am not sure what was intended in the first place. So I will wait for comments before making further changes.

diff --git a/setup.py b/setup.py
index 82c93be..05eb199 100644
--- a/setup.py
+++ b/setup.py
@@ -136,6 +136,8 @@ def get_extensions():

     main_file = glob.glob(os.path.join(extensions_dir, '*.cpp'))
     source_cpu = glob.glob(os.path.join(extensions_dir, 'cpu', '*.cpp'))
+    models_dir = os.path.join(this_dir, 'torchvision', 'csrc', 'models')
+    source_models = glob.glob(os.path.join(models_dir, '*.cpp'))

     is_rocm_pytorch = False
     if torch.__version__ >= '1.5':
@@ -158,7 +160,7 @@ def get_extensions():
     else:
         source_cuda = glob.glob(os.path.join(extensions_dir, 'cuda', '*.cu'))

-    sources = main_file + source_cpu
+    sources = main_file + source_cpu + source_models
     extension = CppExtension

     compile_cpp_tests = os.getenv('WITH_CPP_MODELS_TEST', '0') == '1'

@datumbox datumbox requested a review from fmassa December 2, 2020 16:03
@datumbox
Copy link
Contributor

datumbox commented Dec 2, 2020

@willyd Thanks for your investigation.

I believe that this is intentional because the model C++ files are only compiled if WITH_CPP_MODELS_TEST=1:

vision/setup.py

Lines 164 to 174 in 74de51d

compile_cpp_tests = os.getenv('WITH_CPP_MODELS_TEST', '0') == '1'
if compile_cpp_tests:
test_dir = os.path.join(this_dir, 'test')
models_dir = os.path.join(this_dir, 'torchvision', 'csrc', 'models')
test_file = glob.glob(os.path.join(test_dir, '*.cpp'))
source_models = glob.glob(os.path.join(models_dir, '*.cpp'))
test_file = [os.path.join(test_dir, s) for s in test_file]
source_models = [os.path.join(models_dir, s) for s in source_models]
tests = test_file + source_models
tests_include_dirs = [test_dir, models_dir]

Adding @fmassa, who could perhaps shed some light on the intended behaviour.

@fmassa
Copy link
Member

fmassa commented Dec 2, 2020

Hi,

We don't want to ship models.cpp together with the python binaries for torchvision. They are currently present in setuptools only as a testing artifact (as we didn't have C++-only tests back then), and shouldn't be present with python builds.

Ideally, we wouldn't need to include Python.h at all in the torchvision C++ libraries. It is currently a requirement probably due to the fact that we are using setuptools to compile the extensions.

The current situation with the models.cpp is that it's not actively maintained, and we are just now starting to improve support for the usage on the C++ api (thanks to work from @bmanga and @datumbox ).

@datumbox
Copy link
Contributor

datumbox commented Dec 2, 2020

@willyd We just merged to master the PR #3097 that might have fixed some of the header issues you describe here. Could you try if the latest master solves your problem at #3086?

@willyd
Copy link
Author

willyd commented Dec 2, 2020

@fmassa

We don't want to ship models.cpp together with the python binaries for torchvision. They are currently present in setuptools only as a testing artifact (as we didn't have C++-only tests back then), and shouldn't be present with python builds.

Understood.

Ideally, we wouldn't need to include Python.h at all in the torchvision C++ libraries. It is currently a requirement probably due to the fact that we are using setuptools to compile the extensions.

Thus we should not rely on the extension vision.cpp to register the operators but have a separate cpp file for operator registration, which I think happen in the TORCH_LIBRARY macros?

@willyd
Copy link
Author

willyd commented Dec 2, 2020

@datumbox

Sure. I will try and get back to you.

@willyd
Copy link
Author

willyd commented Dec 2, 2020

@datumbox

I just merged and updated my branch with the latest master. The header issues are indeed fixed.

The only remaining issues with master are:

  • Inclusion of the Python debug library. Ignoring it like I propose fixes the problem.
  • Without the /OPT:NOREF flag consuming projects have to reference the operators directly to have them included.

This PR addresses both the above issues.

@datumbox datumbox requested review from peterjc123 and removed request for fmassa December 8, 2020 10:18
@datumbox
Copy link
Contributor

datumbox commented Dec 8, 2020

@peterjc123 Could you please have a look?

if(MSVC)
# This is required for consuming projects otherwise, the linker
# will strip the operators registration symbols.
target_link_options(${PROJECT_NAME} INTERFACE /OPT:NOREF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope torchvision will remain small like now forever. When it grows larger, this will definitely stop working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this, we thought that #2798 had fixed the linker issues once and for all, but we still need it for Windows as can be seen in

#ifdef _WIN32
// Windows only
// This is necessary until operators are automatically registered on include
static auto _nms = &vision::ops::nms;
#endif
Do you have an idea on how to properly fix this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have time to look into it carefully. So at least for now, I will approve this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @bmanga

# A typical build on Windows needs to have Release and Debug libraries but most
# people don't have the python debug library. Hence, we ignore it here and link
# against the Release python library even in debug mode.
set(PYTHON3_DEBUG_LIBRARY_NAME python${Python3_VERSION_MAJOR}${Python3_VERSION_MINOR}_d.lib)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the quesion may be that why do we need Python for Libtorchvision? It is better to introduce a new variable BUILD_PYTHON and stop using those tricks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need Python for torchvision and we would love to remove it.

It is currently required only on Windows builds as we need

// If we are in a Windows environment, we need to define
// initialization functions for the _custom_ops extension
#ifdef _WIN32
PyMODINIT_FUNC PyInit__C(void) {
// No need to do anything.
return NULL;
}
#endif
, but I think this might be an artifact of us using setuptools to compile the C++ extensions.

@datumbox has tried removing this recently to see if it was still needed, but Windows builds stopped working.

Do you have any ideas on how we could remove this Python dependency on Windows once and for all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmassa We can pass in a variable USE_PYTHON. Apparently, this function is unneeded for non-python builds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterjc123 Just to make sure I understand what you suggest:

  • Add a CMake option USE_PYTHON which defines a preprocessor macro USE_PYTHON or WITH_PYTHON that would not be defined for C++-only builds done via CMake.
  • That preprocessor macro would be defined when building with setuptools

If this is what you suggest I can try to add it to this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willyd Yeah, that's exactly what I mean.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datumbox Thanks for letting me know. Yes I still intend to give it a shot. I am very busy at work lately, but I should have time to put on this PR before the end of the year.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the integration of WITH_PYTHON variable to the CMake project and it works in C++. I still need to rebase on master to get the latest changes and test again. It adds quite a few #ifdef so it is probably best to add a new header to avoid those. I will update this PR when I have something that works with the latest master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willyd that would be great, thanks. Please ping me once you complete the PR to make sure we see it. Since this is quite old, it's in risk of missing any updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willyd Hi there! Wondering if we can close this PR or if you are still interested in sending the patch. Let me know, thanks! :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datumbox Unfortunately, I don't have time to work on the patch. Let's close the PR and if I still have problems when I get back to it, I will submit a new PR.

@codecov
Copy link

codecov bot commented Dec 15, 2020

Codecov Report

Merging #3087 (855e8d1) into master (1a300d8) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3087   +/-   ##
=======================================
  Coverage   73.48%   73.48%           
=======================================
  Files          99       99           
  Lines        9230     9230           
  Branches     1476     1476           
=======================================
  Hits         6783     6783           
  Misses       1991     1991           
  Partials      456      456           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1a300d8...855e8d1. Read the comment docs.

@bmanga
Copy link
Contributor

bmanga commented Feb 11, 2021

An alternative solution for the windows registration failure is in #3380

@datumbox
Copy link
Contributor

An alternative solution for the windows registration failure is in #3380

@willyd @bmanga Does it make sense to close this PR then?

@bmanga
Copy link
Contributor

bmanga commented Feb 11, 2021

I haven't looked into the debug python part, but ideally I think we should get rid of the python dep.

@datumbox
Copy link
Contributor

As discussed with @willyd on the comments above (see #3087 (comment)) we will close the PR and tackle the problem on the future on a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows C++ library build issues
6 participants