-
Notifications
You must be signed in to change notification settings - Fork 63
feat: support dynamic metadata #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
def fancy_pypi_readme(pyproject_path: pathlib.Path) -> str | dict[str, str | None]: | ||
from hatch_fancy_pypi_readme._builder import build_text | ||
from hatch_fancy_pypi_readme._config import load_and_validate_config | ||
|
||
with pyproject_path.open("rb") as ft: | ||
pyproject = tomllib.load(ft) | ||
|
||
config = load_and_validate_config( | ||
pyproject["tool"]["hatch"]["metadata"]["hooks"]["fancy-pypi-readme"] | ||
) | ||
|
||
return { | ||
"content-type": config.content_type, | ||
"text": build_text(config.fragments, config.substitutions), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hynek, would it be possible to expose this as a public function so we don't have to dip into internals? Happy to contribute it if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to decide on a signature for the plugin function first? For example, I proposed to pass the pyproject path as a reasonable approximation of the project root as well as allowing it to be loaded for its own config, but maybe passing the project root as a string plus the already parsed pyproject contents makes more sense as it could conceivably not be at the project root sometimes (perhaps not in the same folder as .git anyway, for setuptools_scm), plus it saves having to reparse the pyproject contents which we had to do to find these plugins anyway. But having the pyproject dict already would mean a different signature required from the underlying packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, passing a parsed pyproject.toml seems cleaner. That would be easy here (you'd just pick out ["tool"]["hatch"]["metadata"]["hooks"]["fancy-pypi-readme"]
). For setuptools_scm, you'd need to build the Configuration from a parsed pyproject.toml, but maybe the public API coming soon makes that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought here was that some tools might not want to use pyproject.toml, for example your conjectured CMake based plugin would probably want to know where the top level CMakeLists.txt was instead. I felt that passing the path to pyproject.toml meant that people could get the root directory of the project as the parent of that path, which they could not do from the contents. I wondered about passing both the project root and the contents, which complicates the signature but saves reparsing pyproject.toml. However, it seems like there is an implicit assumption that build_wheel will only ever be called with the current working directory set to the project root, so plugins could probably just use e.g. Path("CMakeLists.txt")
anyway. Unless anyone knows any reason not to do this, then I guess we can just pass the pyproject_dict by itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pieces of information that might be useful: root directory (as you said, that's the current directory), the pyproject.toml info (though this could be read from the root directory, it's probably easy to pass and saves reparsing), and maybe the config settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def setuptools_scm_version(pyproject_path: pathlib.Path) -> str: | ||
from setuptools_scm import Configuration, _get_version | ||
|
||
config = Configuration.from_file(str(pyproject_path)) | ||
version: str = _get_version(config) | ||
|
||
return version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RonnyPfannschmidt Would you be okay with a public way to access _get_version
? The problem is it's really had to respect tool.setuptools_scm
's config without it, and we don't want to have to reinvent new names for everything on every tool (like hatch-vcs does). It would be nice to just let people configure setuptools_scm the way it's normally configured by reading it's docs, and then we just ask it for the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost have that ready on the current mainline, it can be integrated into the next release
Note that I'm also preparing to supersed setuptools_scm with vcs-versioning within the next couple of weeks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, thanks! Will it avoid the file finder hook that makes setuptools_scm so hard to use for core packaging tools? Things like Spack aren't smart enough to build things in isolation or separate build and install requirements, so a single package depending on setuptools_scm causes every build of every unrelated package to suddenly get different files.
For this PR, we can check the setuptools_scm version, and use the new version if present - it's safer to use private API if it's only for old versions. We also might want to wait on (at least advertising) this until vcs-versioning is out. There's an experimental flag we could hide it behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setuptools_scm currently has no easy / simple way to opt out of file finders
It will still trigger,
Maybe its time to add a contextvar to enable the opt in/out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If vcs-versioning
didn't have the setuptools plugin entrypoint, and only setuptools_scm
did, that would already be very helpful. hatch-vcs and our usage could only depend on the non-invasive vcs-versioning
. Not a complete fix (package that want to use the SCM discovery would still leak setuptools_scm
), but it would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rough plan is to provide a file-finder extra and only enable the file finder if the current project has the extra in the build deps
One thing I'm very curious about would be adding a function / hook that would read metadata from CMake. Not sure there's a good way to reuse the initial CMake run in this system, but you could run cmake, get the FileAPI output, then read it to get the version and such. Not required for an initial implementation, but something to keep in mind. |
The way that I originally envisaged this was that the plugin entrypoint would return a dict of all the dynamic metadata that it knew how to generate, so perhaps both the version and the readme, for example. That would obviously fit better with running CMake in the hook, where you would probably only want to run it once to pick up all the metadata you wanted. You could still specify that a particular metadata parameter take its value from a different entrypoint though, if you had two plugins that both provided version, but you want version from one and readme from the other. Again, it makes sense to agree on this early on. To be honest, I wonder whether this is a proposal which could become a PEP and part of the standard way to define dynamic metadata. It seems to me that with the proliferation of build backends, we are going to end up with plugins like setuptools_scm either having to write hooks to various different backends, or a whole bunch of adapter packages, whereas if we had a single specification that could be used to describe how to resolve dynamic metadata that would simplify the whole process. Backends could still offer their own custom interfaces as well, of course, but this simple pattern would at least be sorted out. |
I like the idea of returning a dict with all metadata a plugin knows about, with it then being opt-in via the config. If you listed the same item twice, we could just run it once and pull both pieces of information from the return dict. Writing a more general spec for multiple backends would be great, though we might want to have a proof of principle running first. This might be the only way Flit would ever gain support for dynamic versions. ;) |
7bcb8ac
to
5c5964e
Compare
pyproject.toml
Outdated
@@ -87,6 +91,9 @@ Examples = "https://github.com/scikit-build/scikit-build-core/tree/main/tests/pa | |||
cmake_extensions = "scikit_build_core.setuptools.extension:cmake_extensions" | |||
cmake_source_dir = "scikit_build_core.setuptools.extension:cmake_source_dir" | |||
|
|||
[project.entry-points."scikit_build.metadata"] | |||
setuptools_scm = "scikit_build_core.settings.metadata:setuptools_scm_version" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setuptools_scm = "scikit_build_core.settings.metadata:setuptools_scm_version" | |
setuptools-scm = "scikit_build_core.settings.metadata:setuptools_scm_version" |
Aren't these supposed to use dashes, according the the linked page?
For new entry points, it is recommended to use only letters, numbers, underscores, dots and dashes (regex
[\w.-]+
).
Ahh, that does mention both underscores and dashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, both are allowed, but I am happy to switch paradigm and use dashes instead. Hopefully the setuptools_scm
will morph into vcs-versioning
but there is still the question of the dash/underscore issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one would still require scikit-build-core
to be built in a git repository right? That would make it difficult for package managers because they tend to use tar balls with no .git
. Is it possible to avoid the dynamic versioning by putting a static file with the version number somewhere (from the package manager script)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is declaring that scikit-build-core provides an entrypoint called setuptools_scm which can be used to fill in dynamic metadata in the projects that it builds. It is not used by scikit-build-core itself, but projects that are using scikit-build-core as a build backend can specify that it should call the entrypoint on their repos during the build, to achieve dynamic versioning. So only users whose packages are in git repos will specify that entrypoint should be used. It might also be sensible for the entrypoint to look at the config_settings
for some kind of version override which could be passed in via the build front end, and it would also be straightforward to write another entrypoint which would read a version from a file (if that file wasn't just pyproject.toml) for projects which wanted that option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still am thinking about moving scikit-build-core over to (well, back to) manual versioning & maybe even going to Flit, just to keep the build process as simple as possible. That wouldn't affect this, which only enabling users of scikit-build-core to use dynamic metadata.
I am having some issues getting the constraints.txt file sorted out - bringing in the hatch-fancy-pypi-readme project adds hatchling as a dependency which in turn requires at least a bump of Edit: Ok, required minimum bumps are |
You could (and maybe should?) make those tests importorskip on the packages (setuptools_scm & hatch-fancy-readme), then the minimal test could just not include them. That's probably a good idea also because some systems (like Spack) can't avoid leaking dependencies, and you never want to leak setuptools_scm, as it changes builds for packages that don't use it. |
Ok, I think we are pretty close to something useful now. Sorry for the noise with the last few commits - I have been struggling with mainly the minimum version tests where it turns out that we need at least pyproject-metadata 0.6.0 in order for the license information to get written out, as this is one of the metadata fields I was dynamically testing setting. All test envs are now passing except for one, with Python 3.8 on ubuntu-latest, where bizarrely the |
Codecov Report
@@ Coverage Diff @@
## main #197 +/- ##
==========================================
+ Coverage 89.23% 89.47% +0.23%
==========================================
Files 46 50 +4
Lines 2081 2128 +47
==========================================
+ Hits 1857 1904 +47
Misses 224 224
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Ok, I figured out the problem, having |
So, the first question: Why entry points over directly listing the module and function to call? If this was implemented in a PEP, I'd highly expect it would be something like this: [project.metadata.plugins] # ([tool.scikit-build-core.metadata] for now for us)
version = "setuptools_scm.dynamic:metadata" (Where this was implemented in I think the main point of entry points is to allow something to automatically happen without opting into it, which we don't want - we want to explicitly opt-into dynamic metadata from a specific source. Totally okay to be wrong here, just wondering if you've thought of this and if there's some special reason entry points are superior for this. Second question: Do we want some way to for these to be able to declare dynamic dependencies? I don't think it's very important for now for metadata, but it's important for implementing a extension building plugin (extensionlib), and I could see some utility here too - for example, if we did implement such a system, we could use it to add dependencies automatically if you use the built-in plugins we provide. No need to request setuptools_scm or hatch-fancy-readme unless you were using their (eventual/hypothetical) implementation. Maybe a SCM plugin would want to add gitpython if git wasn't available, etc. If we did want to do this, then we'd have a specification for the object that is pointed to by the entry above or by the entrypoint; maybe That would be a followup I could do later if we did want it, though. The question really is, do we? In fact, I think we might want to relegate this entirely to a future PEP and not be part of this PR, but trying to be forward compatible would be best, especially if we ask packages to add methods for this. For example, we could suggest the method be called "get_dynamic_metadata" if we hope to make that the required hook name in a future PEP that also has other optional hooks. Quicker question, should these plugins have access to config-settings? This is more thinking about possible PEP material than details here. Validating the completeness of the settings, etc. would be hard, so tempting to just say no. Last thought: It might be worth bringing this up with a few backend developers to see if they'd be interested in supporting such a design as a PEP. |
What you describe is, of course, exactly what an entry point is under the hood. My thoughts:
So I don't have strong feelings about that at all - happy to implement whichever solution is preferred.
I believe that dependencies can be dynamic in the same way as other metadata. What you are proposing is more like appending to a list, rather than having one single source of dependencies the way that we do for e.g. version. We could special case the dependencies list (or indeed allow any field to have a list of entry points or package:function entries that would be merged) but it does become complicated when you start seeing different versions of the same dependency being requested, particularly if two plugins are asking, which one gets precedence etc. I would prefer to avoid this if possible. Also, I do not see it as a problem in general to have to add
I hadn't considered that, having not really used config-settings myself I had just ignored it so far. I suspect that they probably should, I can certainly imagine more complicated plugins like the putative CMake based one might like to allow some extra config values. Arguably just keeping everything in pyproject.toml is cleaner, put wiser heads than mine decided on the need for config_settings to start with, so we might as well pass it through. I don't think we would need to bother with validating it ourselves, it would be the job of a plugin to decide if the bits it cared about were valid (which is also true for pyproject.toml anyway).
I completely agree, submitting this PR was the first step in that process, and hopefully involving @RonnyPfannschmidt and @hynek by trying to get them to agree to provide plugins directly will also give some guidance, but if you have any good contacts with backend developers then do invite them to get involved. Otherwise, I will probably open a few issues on e.g. hatch etc. to see if we get any interest. |
Hatch has also some ideas around that It would be nice if a common standard could be agreed on Then all build tools could integrate common Metadata sources I would recommend some reach out on the pypa discord, ideally setuptools and flit also join |
Not exactly. An entry point is a mapping with several layers (and it's can be really slow to iterate over them due to file accesses if you have many of them - not that we'd be likely to, but this is why Jupyter can't use them).
You'd not take a random function any more than If we chose reasonable names, it could be exactly like how build-backend is now. Specifying a function name could be optional.
I'd follow the current approach, with caching if the same method is called for multiple metadata entries. There's still be one function that gets metadata, and returns all that it can. In fact, I don't think the implementation would change at all, other than removing the one layer of indirection and removing the question about the name's dashes vs. underscores.
This could break existing tooling reading the dynamic list. I think it would probably be more backward compatible as a new field a build backend could learn, rather than changing an existing one and forcing all tooling to adapt.
Sorry, I wasn't clear - I meant allowing a plugin to have custom dependencies for itself - think adding "gitpython" to the I can make a PR to a PR to demonstrate the idea, or I can merge this then make a new PR to discuss it (since this is in a separate repo, the PR to PR would be in that repo). PS: documentation is pretty light at the moment, since it's still in flux, so this doesn't need too much in the way of docs until we build all the docs, which will happen after editable installs are in. |
You can see what I'm thinking in bennyrowland#7. |
Agreed, although a dynamic list as currently used would continue to be valid (and continue to expect some other mechanism to provide the actual value). I suspect you are right that a new field is probably necessary, it is just a bit less elegant because you end up specifying the dynamic fields twice (once in each field).
I am afraid I still don't really understand what the issue that needs solving here is - surely if you want to use e.g. the |
I don't disagree that it would potentially have been elegant, but probably too late - we should try to see what it might break first, probably. I'm not sure how many uses of pyproject.toml other than build backends there are and if they'd break - for example, the brand new GitHub dependency graph...
That's exactly the problem - you can't specify |
Ok, I now understand your point. Of course, if you are running on e.g. Android and don't have CMake, then your proposed plugin will try and install it at that point and fail anyway. I would probably approach that by having a I do agree that there may be other examples where the system you propose would be necessary for some use case, and I am not opposed to adding such functionality, particularly with the "module providing interface functions" paradigm, but I think we need to solicit some more opinions before writing that code. Just a quick heads up also that I am away for the next week or so, so don't expect much progress on this PR during that time, but I haven't lost interest and will definitely get it finished as soon as I get back. |
This doesn't work, because the person that decides to put the plugin in pyproject.toml's build requires is not the person on WebAssembly or whatever that is trying to install it. And if they Checking to see if there's a system cmake/ninja and only requesting the Python package if they are not present provides an easy solution for these other systems - they can just make sure they pre-install these apps. And "normal" (wheel-supported) systems don't have to have them pre-installed. This one of the key reasons to use scikit-build-core, since it auto-includes cmake and ninja using these hooks only if not present, enabling it to support all sorts of environments that often are not supported properly. It's also why I'm pushing forward with the 3.6 drop on scikit-build classic, so users can use scikit-build-core's include system with scikit-build classic if they want. A "extension builder" plugin would clearly need to be able to do this as well. A metadata one, not so sure. The "cmake" one would likely only be used inside a system like scikit-build, which already ensures cmake is present. The git one is the only "realistic" example I can think of so far, and I don't think git packages are binary usually, so just having a dependency on a pure Python package wouldn't be that bad. We don't need to think about or support this until working on a PEP, and then we'll have a lot more options and people with packaging experience that likely would be able to provide more input on this. |
Small comment, if the user first installs |
No, |
(The hook is |
Ok, so that hook would control how you get Maybe this point should be documented because it is not entirely intuitive. |
Yes, exactly. And documentation is planned. :) As soon as we get editable installs going, docs will be fair game. This works exactly the same way in meson-python (ninja only, obviously). |
4f2baad
to
7ce3ead
Compare
Pulling out a change from #197, will reduce the diff there. This could go into 0.2.2, though it changes the signature of SettingsReader, that's not really public. Signed-off-by: Henry Schreiner <[email protected]> Co-authored-by: Ben Rowland <[email protected]>
This is a first pass at a scheme for supporting dynamic metadata in a generic way. It keeps everything as simple as possible by using entry-points with a very simple signature to provide each value. Scikit-build-core's settings have been extended to have an additional metadata field which is just an optional list mapping string keys to string values. For each entry in this list, if the key is also included in the project.dynamic list, scikit-build-core will look up an entry-point from the "skbuild" group where the name matches the value. This entry-point should be a function that accepts the path to pyproject.toml (this seemed like a minimum but the final chosen signature is very much up for debate) and returns the entry that should be inserted into the pyproject["project"][key] entry in the loaded toml config. Also included are a couple of core "plugins" which provide entry-points for setuptools_scm and hatch-fancy-pypi-readme, essentially just adapters as neither tool provides an explicit function matching the proposed entry-point signature. For tests, I have created a fixture which mocks the entrypoints list with fake functions to give more direct control over what values we want to test. I have duplicated most of the "simplest_c" test package to allow the complete build process to be applied, but it is also possible to test with only a pyproject.toml file, and in fact most of the tests I have currently implemented do only use the pyproject.toml file and stop at testing the calculated metadata, although this can of course be changed. There is also a simple test for the setuptools_scm and hatch-fancy-pypi-readme plugins to make sure they are working correctly. Adding setuptools_scm as a test dependency to test the plugin had the unexpected effect of making setuptools backend tests fail because of the file finder hook setuptools_scm provides which includes unexpected files (because they are under source control) in the sdist. To solve this, I have modified those tests to copy the package files to a temporary location before building so that the build source is not under git.
Also changes the name of the entrypoint group to scikit_build.metadata. This commit also changes the SettingsReader to load from a pyproject dict rather than loading from file, this saves parsing the file multiple times to access the metadata as well. A new class method `from_file()` provides the previous functionality of loading from a path.
In later versions of Python (>=3.11) the internals of working with entry points have changed enough to break the mock based approach previously being used to test dynamic metadata plugins. This commit replaces that with a version using real EntryPoint objects which load real functions in the test_dynamic_metadata.py file, the only mocking is switching the real list of entry points with the list of test versions.
This commit removes hatch-fancy-pypi-readme and setuptools-scm from the [test] dependency extra, and makes the test that tests them both only be applied when the packages are installed by some other process. This prevents them both from leaking into the wider dependency space.
This commit modifies the interface for metadata plugins to return a dictionary of metadata keys and values, rather than just a single value. This is useful for plugins which may run costly functions (e.g. CMake) that can provide multiple metadata values, which will now only require a single plugin implementation and a single run during the build.
The min dependency versions are not compatible with the metadata plugins, notably hatch-fancy-pypi-readme. In the CI tests of the minimum version of the package, we therefore remove these plugins from the environment before installing the constrained dependencies and testing.
To test the license metadata being correctly written, we need to upgrade to pyproject-metadata >= 0.6.0
WHen setuptools-scm is installed, building this package in the source tree with Python 3.8 on Ubuntu breaks.
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Signed-off-by: Henry Schreiner <[email protected]>
Thanks! I'll open a discussion to prepare a more general proposal. |
Opened (as issue, actually, so I don't have to enable discussions in this repo) at #230. |
Followup to #197. Supports hooks providing dynamic dependencies. This does not (officially, anyway, without inspecting the call stack) provide a method supporting differentiating between calling hooks. Signed-off-by: Henry Schreiner <[email protected]>
This is a first pass at a scheme for supporting dynamic metadata in a generic way. It keeps everything as simple as possible by using functions in modules with a very simple signature to provide each value. I am happy to discuss the proposed interface and make changes, but thought it would be best to get an implementation out there to have something concrete to discuss.
Scikit-build-core's settings have been extended to have an additional metadata field which is just an optional list mapping string keys to string values. For each entry in this list, if the key is also included in the project.dynamic list, scikit-build-core will look up an a module with a function. This should be a function that accepts the parsed pyproject.toml (this seemed like a minimum but the final chosen signature is very much up for debate, for example, maybe the config settings dict should also be passed) and returns the dict that should be merged into the pyproject["project"] entry in the loaded toml config.
Also included are a couple of core "plugins" which provide entry-points for setuptools_scm and hatch-fancy-pypi-readme, essentially just adapters as neither tool provides an explicit function matching the proposed entry-point signature.
For tests, I have created a fixture which mocks the entrypoints list with fake functions to give more direct control over what values we want to test. I have duplicated most of the "simplest_c" test package to allow the complete build process to be applied, but it is also possible to test with only a pyproject.toml file, and in fact most of the tests I have currently implemented do only use the pyproject.toml file and stop at testing the calculated metadata, although this can of course be changed. There is also a simple test for the setuptools_scm and hatch-fancy-pypi-readme plugins to make sure they are working correctly.
Adding setuptools_scm as a test dependency to test the plugin had the unexpected effect of making setuptools backend tests fail because of the file finder hook setuptools_scm provides which includes unexpected files (because they are under source control) in the sdist. To solve this, I have modified those tests to copy the package files to a temporary location before building so that the build source is not under git.