Skip to content

feat: support dynamic metadata #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 17, 2023
Merged

Conversation

bennyrowland
Copy link
Collaborator

@bennyrowland bennyrowland commented Feb 13, 2023

This is a first pass at a scheme for supporting dynamic metadata in a generic way. It keeps everything as simple as possible by using functions in modules with a very simple signature to provide each value. I am happy to discuss the proposed interface and make changes, but thought it would be best to get an implementation out there to have something concrete to discuss.

Scikit-build-core's settings have been extended to have an additional metadata field which is just an optional list mapping string keys to string values. For each entry in this list, if the key is also included in the project.dynamic list, scikit-build-core will look up an a module with a function. This should be a function that accepts the parsed pyproject.toml (this seemed like a minimum but the final chosen signature is very much up for debate, for example, maybe the config settings dict should also be passed) and returns the dict that should be merged into the pyproject["project"] entry in the loaded toml config.

Also included are a couple of core "plugins" which provide entry-points for setuptools_scm and hatch-fancy-pypi-readme, essentially just adapters as neither tool provides an explicit function matching the proposed entry-point signature.

For tests, I have created a fixture which mocks the entrypoints list with fake functions to give more direct control over what values we want to test. I have duplicated most of the "simplest_c" test package to allow the complete build process to be applied, but it is also possible to test with only a pyproject.toml file, and in fact most of the tests I have currently implemented do only use the pyproject.toml file and stop at testing the calculated metadata, although this can of course be changed. There is also a simple test for the setuptools_scm and hatch-fancy-pypi-readme plugins to make sure they are working correctly.

Adding setuptools_scm as a test dependency to test the plugin had the unexpected effect of making setuptools backend tests fail because of the file finder hook setuptools_scm provides which includes unexpected files (because they are under source control) in the sdist. To solve this, I have modified those tests to copy the package files to a temporary location before building so that the build source is not under git.

Comment on lines 27 to 41
def fancy_pypi_readme(pyproject_path: pathlib.Path) -> str | dict[str, str | None]:
from hatch_fancy_pypi_readme._builder import build_text
from hatch_fancy_pypi_readme._config import load_and_validate_config

with pyproject_path.open("rb") as ft:
pyproject = tomllib.load(ft)

config = load_and_validate_config(
pyproject["tool"]["hatch"]["metadata"]["hooks"]["fancy-pypi-readme"]
)

return {
"content-type": config.content_type,
"text": build_text(config.fragments, config.substitutions),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hynek, would it be possible to expose this as a public function so we don't have to dip into internals? Happy to contribute it if so.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to decide on a signature for the plugin function first? For example, I proposed to pass the pyproject path as a reasonable approximation of the project root as well as allowing it to be loaded for its own config, but maybe passing the project root as a string plus the already parsed pyproject contents makes more sense as it could conceivably not be at the project root sometimes (perhaps not in the same folder as .git anyway, for setuptools_scm), plus it saves having to reparse the pyproject contents which we had to do to find these plugins anyway. But having the pyproject dict already would mean a different signature required from the underlying packages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, passing a parsed pyproject.toml seems cleaner. That would be easy here (you'd just pick out ["tool"]["hatch"]["metadata"]["hooks"]["fancy-pypi-readme"]). For setuptools_scm, you'd need to build the Configuration from a parsed pyproject.toml, but maybe the public API coming soon makes that possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought here was that some tools might not want to use pyproject.toml, for example your conjectured CMake based plugin would probably want to know where the top level CMakeLists.txt was instead. I felt that passing the path to pyproject.toml meant that people could get the root directory of the project as the parent of that path, which they could not do from the contents. I wondered about passing both the project root and the contents, which complicates the signature but saves reparsing pyproject.toml. However, it seems like there is an implicit assumption that build_wheel will only ever be called with the current working directory set to the project root, so plugins could probably just use e.g. Path("CMakeLists.txt") anyway. Unless anyone knows any reason not to do this, then I guess we can just pass the pyproject_dict by itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pieces of information that might be useful: root directory (as you said, that's the current directory), the pyproject.toml info (though this could be read from the root directory, it's probably easy to pass and saves reparsing), and maybe the config settings.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm extremely busy for the foreseeable future:

image

But feel free to meanwhile open an issue on my tracker with an outline what and why you'd like!

Comment on lines 18 to 27
def setuptools_scm_version(pyproject_path: pathlib.Path) -> str:
from setuptools_scm import Configuration, _get_version

config = Configuration.from_file(str(pyproject_path))
version: str = _get_version(config)

return version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RonnyPfannschmidt Would you be okay with a public way to access _get_version? The problem is it's really had to respect tool.setuptools_scm's config without it, and we don't want to have to reinvent new names for everything on every tool (like hatch-vcs does). It would be nice to just let people configure setuptools_scm the way it's normally configured by reading it's docs, and then we just ask it for the version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost have that ready on the current mainline, it can be integrated into the next release

Note that I'm also preparing to supersed setuptools_scm with vcs-versioning within the next couple of weeks

Copy link
Collaborator

@henryiii henryiii Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, thanks! Will it avoid the file finder hook that makes setuptools_scm so hard to use for core packaging tools? Things like Spack aren't smart enough to build things in isolation or separate build and install requirements, so a single package depending on setuptools_scm causes every build of every unrelated package to suddenly get different files.

For this PR, we can check the setuptools_scm version, and use the new version if present - it's safer to use private API if it's only for old versions. We also might want to wait on (at least advertising) this until vcs-versioning is out. There's an experimental flag we could hide it behind.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setuptools_scm currently has no easy / simple way to opt out of file finders

It will still trigger,

Maybe its time to add a contextvar to enable the opt in/out

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If vcs-versioning didn't have the setuptools plugin entrypoint, and only setuptools_scm did, that would already be very helpful. hatch-vcs and our usage could only depend on the non-invasive vcs-versioning. Not a complete fix (package that want to use the SCM discovery would still leak setuptools_scm), but it would help.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rough plan is to provide a file-finder extra and only enable the file finder if the current project has the extra in the build deps

@henryiii henryiii mentioned this pull request Feb 14, 2023
@henryiii
Copy link
Collaborator

One thing I'm very curious about would be adding a function / hook that would read metadata from CMake. Not sure there's a good way to reuse the initial CMake run in this system, but you could run cmake, get the FileAPI output, then read it to get the version and such. Not required for an initial implementation, but something to keep in mind.

@bennyrowland
Copy link
Collaborator Author

One thing I'm very curious about would be adding a function / hook that would read metadata from CMake. Not sure there's a good way to reuse the initial CMake run in this system, but you could run cmake, get the FileAPI output, then read it to get the version and such. Not required for an initial implementation, but something to keep in mind.

The way that I originally envisaged this was that the plugin entrypoint would return a dict of all the dynamic metadata that it knew how to generate, so perhaps both the version and the readme, for example. That would obviously fit better with running CMake in the hook, where you would probably only want to run it once to pick up all the metadata you wanted. You could still specify that a particular metadata parameter take its value from a different entrypoint though, if you had two plugins that both provided version, but you want version from one and readme from the other. Again, it makes sense to agree on this early on.

To be honest, I wonder whether this is a proposal which could become a PEP and part of the standard way to define dynamic metadata. It seems to me that with the proliferation of build backends, we are going to end up with plugins like setuptools_scm either having to write hooks to various different backends, or a whole bunch of adapter packages, whereas if we had a single specification that could be used to describe how to resolve dynamic metadata that would simplify the whole process. Backends could still offer their own custom interfaces as well, of course, but this simple pattern would at least be sorted out.

@henryiii
Copy link
Collaborator

I like the idea of returning a dict with all metadata a plugin knows about, with it then being opt-in via the config. If you listed the same item twice, we could just run it once and pull both pieces of information from the return dict.

Writing a more general spec for multiple backends would be great, though we might want to have a proof of principle running first. This might be the only way Flit would ever gain support for dynamic versions. ;)

pyproject.toml Outdated
@@ -87,6 +91,9 @@ Examples = "https://github.com/scikit-build/scikit-build-core/tree/main/tests/pa
cmake_extensions = "scikit_build_core.setuptools.extension:cmake_extensions"
cmake_source_dir = "scikit_build_core.setuptools.extension:cmake_source_dir"

[project.entry-points."scikit_build.metadata"]
setuptools_scm = "scikit_build_core.settings.metadata:setuptools_scm_version"
Copy link
Collaborator

@henryiii henryiii Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setuptools_scm = "scikit_build_core.settings.metadata:setuptools_scm_version"
setuptools-scm = "scikit_build_core.settings.metadata:setuptools_scm_version"

Aren't these supposed to use dashes, according the the linked page?

For new entry points, it is recommended to use only letters, numbers, underscores, dots and dashes (regex [\w.-]+).

Ahh, that does mention both underscores and dashes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, both are allowed, but I am happy to switch paradigm and use dashes instead. Hopefully the setuptools_scm will morph into vcs-versioning but there is still the question of the dash/underscore issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one would still require scikit-build-core to be built in a git repository right? That would make it difficult for package managers because they tend to use tar balls with no .git. Is it possible to avoid the dynamic versioning by putting a static file with the version number somewhere (from the package manager script)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is declaring that scikit-build-core provides an entrypoint called setuptools_scm which can be used to fill in dynamic metadata in the projects that it builds. It is not used by scikit-build-core itself, but projects that are using scikit-build-core as a build backend can specify that it should call the entrypoint on their repos during the build, to achieve dynamic versioning. So only users whose packages are in git repos will specify that entrypoint should be used. It might also be sensible for the entrypoint to look at the config_settings for some kind of version override which could be passed in via the build front end, and it would also be straightforward to write another entrypoint which would read a version from a file (if that file wasn't just pyproject.toml) for projects which wanted that option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still am thinking about moving scikit-build-core over to (well, back to) manual versioning & maybe even going to Flit, just to keep the build process as simple as possible. That wouldn't affect this, which only enabling users of scikit-build-core to use dynamic metadata.

@bennyrowland
Copy link
Collaborator Author

bennyrowland commented Feb 17, 2023

I am having some issues getting the constraints.txt file sorted out - bringing in the hatch-fancy-pypi-readme project adds hatchling as a dependency which in turn requires at least a bump of tomli==1.2.2, and apparently something else which pip has been working on my system for over an hour so far to try and resolve (and seems to be expected to take even longer based on some of the cancelled CI test runs). Obviously this is only a test time dependency unless the user wants to use the plugin, so perhaps it makes sense to pull the [test] extra from the constraints testing rather than bumping the minimum versions up higher than they are? Otherwise we probably need to add some extra minimum constraints on other test dependencies like build - I feel like setting reasonably recent version constraints on the test dependencies is more acceptable than the runtime dependencies?

Edit: Ok, required minimum bumps are tomli==1.2.2 and packaging==21.3. I have also specified build[virtualenv] >= 7, older versions don't support the virtualenv extra so we probably don't want them anyway.

@henryiii
Copy link
Collaborator

You could (and maybe should?) make those tests importorskip on the packages (setuptools_scm & hatch-fancy-readme), then the minimal test could just not include them. That's probably a good idea also because some systems (like Spack) can't avoid leaking dependencies, and you never want to leak setuptools_scm, as it changes builds for packages that don't use it.

@bennyrowland
Copy link
Collaborator Author

Ok, I think we are pretty close to something useful now. Sorry for the noise with the last few commits - I have been struggling with mainly the minimum version tests where it turns out that we need at least pyproject-metadata 0.6.0 in order for the license information to get written out, as this is one of the metadata fields I was dynamically testing setting. All test envs are now passing except for one, with Python 3.8 on ubuntu-latest, where bizarrely the test_abi3_wheel test has started failing and I have no idea why. From the logs, the wheel is correctly built, and even installed, but then we get a PytestUnraisableExceptionWarning that seems to emerge from the subprocess.run() from executing the command in the venv. I am developing on Windows and don't have ready access to an ubuntu box to try and debug this, @henryiii do you have any ideas about what might be going on?

@codecov
Copy link

codecov bot commented Feb 19, 2023

Codecov Report

Merging #197 (e1fbcbe) into main (1e0dcb9) will increase coverage by 0.23%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #197      +/-   ##
==========================================
+ Coverage   89.23%   89.47%   +0.23%     
==========================================
  Files          46       50       +4     
  Lines        2081     2128      +47     
==========================================
+ Hits         1857     1904      +47     
  Misses        224      224              
Impacted Files Coverage Δ
src/scikit_build_core/build/sdist.py 98.33% <100.00%> (ø)
src/scikit_build_core/build/wheel.py 80.16% <100.00%> (+0.16%) ⬆️
src/scikit_build_core/metadata/__init__.py 100.00% <100.00%> (ø)
...rc/scikit_build_core/metadata/fancy_pypi_readme.py 100.00% <100.00%> (ø)
src/scikit_build_core/metadata/setuptools_scm.py 100.00% <100.00%> (ø)
src/scikit_build_core/settings/metadata.py 100.00% <100.00%> (ø)
src/scikit_build_core/settings/skbuild_model.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@bennyrowland
Copy link
Collaborator Author

Ok, I figured out the problem, having setuptools-scm installed is what causes the test_setuptools_abi test to fail, but only on Py3.8 on Ubuntu, for some reason. My latest commit copies the ABI test directory out of the source tree before building it in the test, which fixes the problem, so it should now pass all tests. At this point we need to decide if we are happy with the current interface specification and usage, in which case I will draft some documentation to go with this change, and we should be pretty much done.

@henryiii
Copy link
Collaborator

henryiii commented Feb 20, 2023

So, the first question: Why entry points over directly listing the module and function to call? If this was implemented in a PEP, I'd highly expect it would be something like this:

[project.metadata.plugins]  # ([tool.scikit-build-core.metadata] for now for us)
version = "setuptools_scm.dynamic:metadata"

(Where this was implemented in setuptools_scm/dynamic.py as metadata(...)). If we did something similar, we would obviously have to call scikit_build_core.<stuff> for our built-in methods, but if setuptools_scm or hatch-fancy-readme implemented this, there wouldn't be a conflict in entry point names, and it's a lot more likely we could talk these into providing a function somewhere with the right API than we could talk them into adding a custom entry point with scikit-build-core in it.

I think the main point of entry points is to allow something to automatically happen without opting into it, which we don't want - we want to explicitly opt-into dynamic metadata from a specific source.

Totally okay to be wrong here, just wondering if you've thought of this and if there's some special reason entry points are superior for this.


Second question: Do we want some way to for these to be able to declare dynamic dependencies? I don't think it's very important for now for metadata, but it's important for implementing a extension building plugin (extensionlib), and I could see some utility here too - for example, if we did implement such a system, we could use it to add dependencies automatically if you use the built-in plugins we provide. No need to request setuptools_scm or hatch-fancy-readme unless you were using their (eventual/hypothetical) implementation. Maybe a SCM plugin would want to add gitpython if git wasn't available, etc.

If we did want to do this, then we'd have a specification for the object that is pointed to by the entry above or by the entrypoint; maybe .__call__() (or we pick a required method name) is used for the normal usage, and if .get_plugin_requires_for_*() is defined, then those are supposed to be called and added to the build system's matching hooks.

That would be a followup I could do later if we did want it, though. The question really is, do we? In fact, I think we might want to relegate this entirely to a future PEP and not be part of this PR, but trying to be forward compatible would be best, especially if we ask packages to add methods for this. For example, we could suggest the method be called "get_dynamic_metadata" if we hope to make that the required hook name in a future PEP that also has other optional hooks.


Quicker question, should these plugins have access to config-settings? This is more thinking about possible PEP material than details here. Validating the completeness of the settings, etc. would be hard, so tempting to just say no.


Last thought: It might be worth bringing this up with a few backend developers to see if they'd be interested in supporting such a design as a PEP.

@bennyrowland
Copy link
Collaborator Author

Why entry points over directly listing the module and function to call?

What you describe is, of course, exactly what an entry point is under the hood. My thoughts:

  1. using entry points is kind of a contract by the provider that they implement the required specification of the function call, so you know that any entry point you use is designed to be used in that way, as opposed to taking a random function and hoping it works correctly.
  2. it is also a bit more convenient for a user to enter an entry-point name rather than a package:function string e.g. version=vcs-versioning instead of version=vcs_versioning:get_dynamic_metadata
  3. with the modified plugin interface where we return a dictionary of metadata entries per plugin that is cached and used to lookup individual items that specify that plugin, it no longer makes sense to have functions like vcs_versioning:get_version. We could revert to the previous approach of returning a single metadata value, but would lose the ability to run e.g. CMake once and extract multiple pieces of metadata. Note that it is difficult to support both, because how do we tell the difference between a dictionary of metadata values and a dictionary as the value for one piece of metadata (I suppose we could make a generally good guess based on keys). That means that getting direct support from vcs-versioning et al. to provide their own implementations of our function, entry point or otherwise, are less likely unless it were to become a universal standard. (Follow-up thought, we could have a specification like vcs_versioning:get_version where we expect a single entry, or scikit_build.cmakemeta:get_metadata.version where we can specify a key of version from the returned value from the get_metadata() function, with the other values being cached against use in other dynamic metadata fields. That starts to get complicated, but is very flexible)
  4. My thought for a PEP was perhaps to go directly into the dynamic list like:
    dynamic = ["version=vcs-versioning", "readme=hatch-fancy-pypi-readme", "license"]]
    (where license would in this case then have to be filled in via some other backend specific mechanism). That would work for either specification method, of course.

So I don't have strong feelings about that at all - happy to implement whichever solution is preferred.

Do we want some way to for these to be able to declare dynamic dependencies?

I believe that dependencies can be dynamic in the same way as other metadata. What you are proposing is more like appending to a list, rather than having one single source of dependencies the way that we do for e.g. version. We could special case the dependencies list (or indeed allow any field to have a list of entry points or package:function entries that would be merged) but it does become complicated when you start seeing different versions of the same dependency being requested, particularly if two plugins are asking, which one gets precedence etc. I would prefer to avoid this if possible. Also, I do not see it as a problem in general to have to add vcs-versioning to the [build-system]requires field if I want to use the vcs-versioning entry point for version, that avoids the potential for "dependency bleeding" that we have seen so much with setuptools-scm. I am happy to be persuaded of the need, but only if we can define how to handle all the edge cases.

should these plugins have access to config-settings?

I hadn't considered that, having not really used config-settings myself I had just ignored it so far. I suspect that they probably should, I can certainly imagine more complicated plugins like the putative CMake based one might like to allow some extra config values. Arguably just keeping everything in pyproject.toml is cleaner, put wiser heads than mine decided on the need for config_settings to start with, so we might as well pass it through. I don't think we would need to bother with validating it ourselves, it would be the job of a plugin to decide if the bits it cared about were valid (which is also true for pyproject.toml anyway).

It might be worth bringing this up with a few backend developers to see if they'd be interested in supporting such a design as a PEP

I completely agree, submitting this PR was the first step in that process, and hopefully involving @RonnyPfannschmidt and @hynek by trying to get them to agree to provide plugins directly will also give some guidance, but if you have any good contacts with backend developers then do invite them to get involved. Otherwise, I will probably open a few issues on e.g. hatch etc. to see if we get any interest.

@RonnyPfannschmidt
Copy link

Hatch has also some ideas around that

It would be nice if a common standard could be agreed on

Then all build tools could integrate common Metadata sources

I would recommend some reach out on the pypa discord, ideally setuptools and flit also join

@henryiii
Copy link
Collaborator

henryiii commented Feb 20, 2023

exactly what an entry point is under the hood

Not exactly. An entry point is a mapping with several layers (and it's can be really slow to iterate over them due to file accesses if you have many of them - not that we'd be likely to, but this is why Jupyter can't use them).

random function and hoping it works correctly

You'd not take a random function any more than build-backend is a random function. You'd specify a location that follows the protocol we've defined. You still have to look up the entry point too, as it's an arbitrary string - not even tied to where it comes from. I think most users would be more comfortable with something they can look up and investigate themselves, vs. an entrypoint that could come from anywhere. The main feature of an entrypoint is that you can iterate over all of them, which is useful for flake8/pytest plugins, but isn't a feature we are using here.

If we chose reasonable names, it could be exactly like how build-backend is now. Specifying a function name could be optional.

it no longer makes sense to have functions like

I'd follow the current approach, with caching if the same method is called for multiple metadata entries. There's still be one function that gets metadata, and returns all that it can. In fact, I don't think the implementation would change at all, other than removing the one layer of indirection and removing the question about the name's dashes vs. underscores.

perhaps to go directly into the dynamic list

This could break existing tooling reading the dynamic list. I think it would probably be more backward compatible as a new field a build backend could learn, rather than changing an existing one and forcing all tooling to adapt.

I believe that dependencies can be dynamic in the same way as other metadata.

Sorry, I wasn't clear - I meant allowing a plugin to have custom dependencies for itself - think adding "gitpython" to the get_requires_for_build_wheel hook if git is not installed, for example. This is very important for a plugin building binaries (allowing it to add ninja and cmake), not sure it it's all that critical for a metadata plugin. But maybe we could design metadata and extensions (extensionlib) to share the same interface.


I can make a PR to a PR to demonstrate the idea, or I can merge this then make a new PR to discuss it (since this is in a separate repo, the PR to PR would be in that repo).

PS: documentation is pretty light at the moment, since it's still in flux, so this doesn't need too much in the way of docs until we build all the docs, which will happen after editable installs are in.

@henryiii
Copy link
Collaborator

You can see what I'm thinking in bennyrowland#7.

@bennyrowland
Copy link
Collaborator Author

This could break existing tooling reading the dynamic list.

Agreed, although a dynamic list as currently used would continue to be valid (and continue to expect some other mechanism to provide the actual value). I suspect you are right that a new field is probably necessary, it is just a bit less elegant because you end up specifying the dynamic fields twice (once in each field).

allowing a plugin to have custom dependencies for itself

I am afraid I still don't really understand what the issue that needs solving here is - surely if you want to use e.g. the vcs-versioning plugin then you add vcs-versioning to the [build-system]requires list in pyproject.toml. That would then install its own dependencies as normal. So if we created a CMake metadata reading plugin (as a standalone package), that would have cmake as a dependency, and install automatically. I will take a look at extensionlib and see if it makes more sense to me after that.

@henryiii
Copy link
Collaborator

it is just a bit less elegant

I don't disagree that it would potentially have been elegant, but probably too late - we should try to see what it might break first, probably. I'm not sure how many uses of pyproject.toml other than build backends there are and if they'd break - for example, the brand new GitHub dependency graph...

So if we created a CMake metadata reading plugin (as a standalone package), that would have cmake as a dependency, and install automatically.

That's exactly the problem - you can't specify cmake as a dependency - as soon as you do, you are now broken on every platform that doesn't ship binary wheels for cmake, like FreeBSD, Android, ClearLinux, etc. And you don't actually require the cmake package, what you actually (at least should) require is the cmake command line app - if it is present and of a sufficient version, you can avoid injecting cmake as a dependency. The best place to look for that is scikit-build-core itself, or meson-python. The "fix" for this is to put logic in get_requires_for_build_wheel and similar hooks, which can check to see if a sufficient version of whatever command line tools you ask for are present. If we specified plugin versions of these that backends expecting plugins had to respect, we could extend that to plugins.

@bennyrowland
Copy link
Collaborator Author

Ok, I now understand your point. Of course, if you are running on e.g. Android and don't have CMake, then your proposed plugin will try and install it at that point and fail anyway. I would probably approach that by having a [cmake] extra in the plugin, then users that want to install CMake independently can do that, and others can add the extra to the build-requires for the plugin. Running the plugin would then produce a message saying "can't find CMake, either install manually or add the cmake extra to your build-requires". Granted, this adds one level of complexity for users on a system where the CMake wheel is available, but it allows the metadata plugin system to be simpler at the same time.

I do agree that there may be other examples where the system you propose would be necessary for some use case, and I am not opposed to adding such functionality, particularly with the "module providing interface functions" paradigm, but I think we need to solicit some more opinions before writing that code.

Just a quick heads up also that I am away for the next week or so, so don't expect much progress on this PR during that time, but I haven't lost interest and will definitely get it finished as soon as I get back.

@henryiii
Copy link
Collaborator

henryiii commented Feb 22, 2023

plugin, then users that want to install CMake independently can do that

This doesn't work, because the person that decides to put the plugin in pyproject.toml's build requires is not the person on WebAssembly or whatever that is trying to install it. And if they pip install cmake, that still doesn't work, since it's in an isolated environment and not in the host environment. You'd literally have to edit the pyproject.toml of the package you are trying to install. And when cmake is listed, then it's never going to pass dependency checks, which will have all the package managers down our backs about shipping a fake "cmake" and "ninja" package that just silences those tests.

Checking to see if there's a system cmake/ninja and only requesting the Python package if they are not present provides an easy solution for these other systems - they can just make sure they pre-install these apps. And "normal" (wheel-supported) systems don't have to have them pre-installed.

This one of the key reasons to use scikit-build-core, since it auto-includes cmake and ninja using these hooks only if not present, enabling it to support all sorts of environments that often are not supported properly. It's also why I'm pushing forward with the 3.6 drop on scikit-build classic, so users can use scikit-build-core's include system with scikit-build classic if they want.

A "extension builder" plugin would clearly need to be able to do this as well. A metadata one, not so sure. The "cmake" one would likely only be used inside a system like scikit-build, which already ensures cmake is present. The git one is the only "realistic" example I can think of so far, and I don't think git packages are binary usually, so just having a dependency on a pure Python package wouldn't be that bad.

We don't need to think about or support this until working on a PEP, and then we'll have a lot more options and people with packaging experience that likely would be able to provide more input on this.

@LecrisUT
Copy link
Collaborator

Checking to see if there's a system cmake/ninja and only requesting the Python package if they are not present provides an easy solution for these other systems

Small comment, if the user first installs scikit-build-core without having cmake installed (or having it bundled like with CLion), then it will install pip install cmake right? But then if they install cmake on the system (or want to link it to the CLion version for example), will scikit-build-core still default to the locally built version or would it still use the pip installed one?

@henryiii
Copy link
Collaborator

henryiii commented Feb 22, 2023

No, scikit-build-core will never install cmake or ninja when you pip install it, they are not dependencies. It will only add those when called as a PEP 517 backend in an isolated environment if no system version is available. If the cmake or ninja packages are separately pip installed, though, it will use those versions over the system ones.

@henryiii
Copy link
Collaborator

(The hook is get_requires_for_build_wheel, which is called by all PEP 517 builders in isolated mode - that gets to return a list of requirements for building and being a function, it is dynamic. There's also an sdist and editable version of that hook.)

@LecrisUT
Copy link
Collaborator

Ok, so that hook would control how you get scikit-build-core within the pyproject.toml, which will be for the production packaging. That makes sense. But otherwise, for local development, if the user simply installs scikit-build-core on their own environment, it will not be fully functional unless they provide a cmake either from system or pip. And if they added it via pip then they can go back to the system by uninstalling it. Ok this makes sense, thanks for the clarification.

Maybe this point should be documented because it is not entirely intuitive.

@henryiii
Copy link
Collaborator

Yes, exactly.

And documentation is planned. :) As soon as we get editable installs going, docs will be fair game.

This works exactly the same way in meson-python (ninja only, obviously).

henryiii added a commit that referenced this pull request Mar 15, 2023
Pulling out a change from #197, will reduce the diff there. This could
go into 0.2.2, though it changes the signature of SettingsReader, that's
not really public.

Signed-off-by: Henry Schreiner <[email protected]>
Co-authored-by: Ben Rowland <[email protected]>
Ben Rowland and others added 17 commits March 15, 2023 14:26
This is a first pass at a scheme for supporting dynamic metadata in a
generic way. It keeps everything as simple as possible by using entry-points
with a very simple signature to provide each value.

Scikit-build-core's settings have been extended to have an additional metadata
field which is just an optional list mapping string keys to string values.
For each entry in this list, if the key is also included in the
project.dynamic list, scikit-build-core will look up an entry-point from the
"skbuild" group where the name matches the value. This entry-point should be a
function that accepts the path to pyproject.toml (this seemed like a minimum
but the final chosen signature is very much up for debate) and returns the
entry that should be inserted into the pyproject["project"][key] entry in the
loaded toml config.

Also included are a couple of core "plugins" which provide entry-points for
setuptools_scm and hatch-fancy-pypi-readme, essentially just adapters as
neither tool provides an explicit function matching the proposed entry-point
signature.

For tests, I have created a fixture which mocks the entrypoints list with fake
functions to give more direct control over what values we want to test. I have
duplicated most of the "simplest_c" test package to allow the complete build
process to be applied, but it is also possible to test with only a
pyproject.toml file, and in fact most of the tests I have currently implemented
do only use the pyproject.toml file and stop at testing the calculated
metadata, although this can of course be changed. There is also a simple test
for the setuptools_scm and hatch-fancy-pypi-readme plugins to make sure they
are working correctly.

Adding setuptools_scm as a test dependency to test the plugin had the
unexpected effect of making setuptools backend tests fail because of the file
finder hook setuptools_scm provides which includes unexpected files (because
they are under source control) in the sdist. To solve this, I have modified
those tests to copy the package files to a temporary location before building
so that the build source is not under git.
Also changes the name of the entrypoint group to scikit_build.metadata.

This commit also changes the SettingsReader to load from a pyproject dict
rather than loading from file, this saves parsing the file multiple times
to access the metadata as well. A new class method `from_file()` provides the
previous functionality of loading from a path.
In later versions of Python (>=3.11) the internals of working with entry points
have changed enough to break the mock based approach previously being used to
test dynamic metadata plugins. This commit replaces that with a version using
real EntryPoint objects which load real functions in the
test_dynamic_metadata.py file, the only mocking is switching the real list of
entry points with the list of test versions.
This commit removes hatch-fancy-pypi-readme and setuptools-scm from the [test]
dependency extra, and makes the test that tests them both only be applied when
the packages are installed by some other process. This prevents them both from
leaking into the wider dependency space.
This commit modifies the interface for metadata plugins to return a dictionary
of metadata keys and values, rather than just a single value. This is useful
for plugins which may run costly functions (e.g. CMake) that can provide multiple
metadata values, which will now only require a single plugin implementation and
a single run during the build.
The min dependency versions are not compatible with the metadata plugins,
notably hatch-fancy-pypi-readme. In the CI tests of the minimum version of the
 package, we therefore remove these plugins from the environment before
installing the constrained dependencies and testing.
To test the license metadata being correctly written, we need to upgrade to
pyproject-metadata >= 0.6.0
WHen setuptools-scm is installed, building this package in the source tree with
Python 3.8 on Ubuntu breaks.
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
@henryiii henryiii added this to the v0.3.0 milestone Mar 15, 2023
@henryiii henryiii merged commit 6540cf6 into scikit-build:main Mar 17, 2023
@henryiii
Copy link
Collaborator

Thanks! I'll open a discussion to prepare a more general proposal.

@henryiii
Copy link
Collaborator

Opened (as issue, actually, so I don't have to enable discussions in this repo) at #230.

henryiii added a commit that referenced this pull request Mar 20, 2023
Followup to #197.
Supports hooks providing dynamic dependencies. This does not
(officially, anyway, without inspecting the call stack) provide a method
supporting differentiating between calling hooks.

Signed-off-by: Henry Schreiner <[email protected]>
This was referenced Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants