Rethink bundling of requirements #5074

OriolAbril · 2021-10-13T23:48:47Z

I want to take this opportunity and rethink all these conda environments we have. I understand they are helpful and convenient to create our dev environments and they are also used for ci. However, at least in my case, the extra work and burden on updating 4 (8 now) nearly equivalent files with a pre-commit check that seems to only partially work (i.e. pandas has a lower bound in 3.7 and 3.8 but not in 3.9 nor in requirements-dev.txt) is clearly larger than the benefit I get. Moreover, issues with these files, installation and pre-commit blocked #5060. #5062 was only needed because of #5060 being blocked.

I think there are several issues at hand regarding all this.

No single source of truth. From what I understand, the pre-commit script somehow takes all conda env files and from them generates the pip file. However, it seems like conda files don't need to be equivalent between them (see for example from add 404 page to docs #5057 how nbsphinx was on some conda files but not all of them, as well as several versions not matching between files).
Overly bloated and non-specific dependency files. We have 2/3 (still not completely understand the new test file but it seems like a step in the right direction) types of requirements files, library requirements (user facing), dev and test ones. But we use those files to do much more than 3 types of things. We generate local envs for devs, we set up CI envs in github acions for testing, we set up the env on rtd to build documentation... So here are some things that we are doing and I think we should avoid as they also make our ci more brittle and prone to conda finding version conflicts:
- We are installing sphinx and sphinx extensions everywhere, even on github actions
- We are installing pytest on rtd
- We are installing libraries like watermark or sphinx-autobuild also everywhere, but these are only used locally! watermark is only needed when writing/running example notebooks but not to build the docs that contain such notebooks. sphinx-autobuild is only needed to get an auto-updating preview of the docs locally.
Most of us (or maybe even all of us) have no idea about what some of these dependencies are for. I for example have no idea what cachetools is for. I think that this is both issue and a consequence of the previous two, but I think it's important to put that on the table, because in my opinion, unless we make an effort in explaining all that in the developer guide and keep it up to date it will only get worse.

Ideas for possible solutions:

Being more strict about CI. I have the feeling that some PRs are blocked when only pre-commit fails when others are green-lighted with failing tests and/or rtd job. I haven't written much tests myself here and I imagine testing distributions is inherently hard, but I think we need some way of getting rid of flaky tests or grouping them in a "potentially flaky" job, any idea here welcome. I think that having flaky tests makes ourselves open to the possibility of "maybe this is a flaky test too, I don't recall changing anything that should affect this test" or similar reasonings. On that end we might also benefit from skipping jobs either manually or automatically. i.e. if we only update the readme we don't need to run any ci job but if we modify requirements we should run all jobs even if the library modified isn't directly used everywhere as the change could generate a corrupted environment.
Being more strict about reviews. In a similar manner, I have a feeling that some PRs are merged without approving reviews or with approving reviews even though there were still open questions in the PR comments.
Split requirements into task related chunks. This by design eliminates the 1 line to dev environment/single file to serve all, so we'll need to install 2-4 files, I'm also not sure how that would fit with all the pre-commit scripts but I'm sure we'll figure it out, maybe even write a helper script to keep creating dev envs a single line operation. Here are some ideas:
- requirements.txt -> requirements like we have now for pymc
- requirements-optional.txt -> optional requirements for pymc, we could also update setup.py so that running pip install pymc[all] installs both requiremetns and requirements-optional
- requirements-test.txt -> requirements for testing
- requirements-docs.txt -> requirements for building the docs
- requirements-write.txt -> requirements for writing example notebooks, see https://github.com/pymc-devs/pymc-examples/blob/main/requirements-write.txt
- others?
Documenting everything. Explain on the developer guide/contributing guide when should each file be used plus maybe also extra recommendations for specific local env only tools (doc related examples of that would be sphinx-autobuild or sphobjinv libraries)

cc @Sayam753

The text was updated successfully, but these errors were encountered:

twiecki · 2021-10-14T07:28:07Z

I realized last night that we could easily cut that down to 5 as we only need a single dev environment + testing envs.
CC @MarcoGorelli

michaelosthege · 2022-02-15T22:12:06Z

Yes, we should be more strict about the CI. A red ❌ CI on main should be an immediate call to action to get it fixed.
Something like the past 6 days with a ❌ on main should never happen again.

On splitting the dependencies, I think we should try to keep it with two requirements.txt files. A basic one and one with all testing & dev & docs dependencies relevant within the scope of this repo. The pymc-examples repo has its own.

I think the third element is a restructuring of the CI workflows:

Get all jobs into the same workflow YAML
Re-think our matrix strategy: Some things should be tested across OSes (installation, floatX) and for others we might want to test different numpy/scipy versions but the OS doesn't matter (rng/logp stuff).

ricardoV94 · 2022-02-17T06:36:15Z

Let's start by cutting down the 3 dev envs to a single one?

ricardoV94 · 2022-06-20T10:25:09Z

For the tests, it seems we can easily specify the python version outside of the conda environment file: https://github.com/conda-incubator/setup-miniconda#usage-examples

michaelosthege · 2022-07-15T10:38:07Z

@OriolAbril would you say the simplifications made by #5911 close this issue?

If not, which are the most important items that we can track in separate smaller issues?

OriolAbril · 2022-07-23T14:36:00Z

It still feels less than ideal that to bump Aesara or ArviZ we need to update over 4 files.

If possible, I think it would be best to have something like: env-base.yml, env-test.yml, env-docs.yml, windows-extras.yml and then in github actions (or install instructions do something like):

conda env create -f conda-envs/env-base.yml  # has pymc dependencies
conda activate pymc-dev
conda env update --file env-test.yml  # for testing, has *only* test dependencies
# instead of duplicating the dependencies and adding the test ones
conda env update --file env-docs.yml # for docs, likewise, can be concatenated

I don't know much for windows, but it might be needed to initialize the env with windows-extras then update it twice with dependencies and specific test or docs dependencies.

twiecki · 2022-07-25T08:07:25Z

I like that. Although I think we already improve the current situation without that. As all the compiler and blas packages are handled in aseara now (thanks to @maresb) I think we could just have a package for test and one for docs.

maresb · 2022-07-26T17:51:50Z

In order to move towards a single source of truth, could we generate requirements.txt from environment-dev.yml like we do with requirements-dev.txt?

I also like @OriolAbril's idea of breaking things up into .env files with independent requirements.

maresb · 2022-07-26T18:04:50Z

Regarding the pip: dependencies in environment-dev.yml, polyagamma already exists as a Conda-Forge package, and sphinx-remove-toctrees will exist soon.

twiecki · 2022-07-27T10:56:43Z

Is there an env file for building docs? I can't find it.

Without that, I think we can merge all current ones into one env file.

maresb · 2022-08-25T06:26:02Z

Regarding the pip: dependencies in environment-dev.yml, polyagamma already exists as a Conda-Forge package, and sphinx-remove-toctrees will exist soon.

Done in #6060

Is there an env file for building docs?

Yes, we're using environment-dev.yml for the docs.

twiecki · 2022-08-25T10:49:42Z

We can probably unify all our env files at this point, yes?

maresb · 2022-08-25T12:35:22Z

Possibly yes, but it's not so clear to me yet. It would require sitting down and tracking down what's safe to use, and also leaning on the implicit dependencies of Aesara to avoid platform-specific issues. But I think it's worth a try.

To clarify, the goal would be to have a single env file which combines dev, test, and docs dependencies, and works on all platforms?

twiecki · 2022-08-25T13:40:17Z

To clarify, the goal would be to have a single env file which combines dev, test, and docs dependencies, and works on all platforms?

The goal is to simplify the env files as much as possible, but not more than that ;).

maresb · 2022-09-10T12:06:46Z

I have been thinking about this, and I've come to the conclusion that the available tooling is very close, but not quite there yet. Rather than throw together some flaky scripts to close the gap, I think the biggest impact would be contributing to conda-lock, and the maintainer there is really responsive. That way, PyMC (plus a few other things I'm working on) would be my motivating use case.

That said, I don't have any time horizon on this, so if anyone wants to contribute some quick-wins in the meantime, then please go ahead.

twiecki · 2022-09-12T09:00:22Z

@maresb That sounds like a fine approach to me. Our current setup is totally fine and works, so there's no rush and we can just wait for the proper tools.

Having said that, I suppose I'm a bit unsure why we need OS specific envs in the first place as we're noarch now for PyMC.

maresb · 2022-09-12T09:23:46Z

@twiecki, great point! That is indeed a quick-win which we could easily tackle right away.

I'm probably not the best person to implement this since I don't have Windows, but one possible way forward is to make incremental changes to windows-environment-*.yml and environment-*.yml, until they have the same contents. Also we should preferably add comments to explain what the individual dependencies are doing and why we need them. (That should make future maintenance much easier, since currently I don't know what might break if I modify something.)

OriolAbril mentioned this issue Feb 6, 2022

Pin SciPy to <1.8.0 #5449

Merged

ricardoV94 mentioned this issue Mar 11, 2022

Update aesara and aeppl dependencies #5582

Merged

ricardoV94 pinned this issue Mar 19, 2022

OriolAbril mentioned this issue Jun 4, 2022

Add jupyter-sphinx to requirements-dev.txt and conda envs #5845

Closed

ricardoV94 mentioned this issue Jun 20, 2022

Get rid of python-version specific conda environments #5911

Merged

ricardoV94 unpinned this issue Jun 29, 2022

Uh oh!

Rethink bundling of requirements #5074

Rethink bundling of requirements #5074

Comments

OriolAbril commented Oct 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

twiecki commented Oct 14, 2021

Uh oh!

michaelosthege commented Feb 15, 2022

Uh oh!

ricardoV94 commented Feb 17, 2022

Uh oh!

ricardoV94 commented Jun 20, 2022

Uh oh!

michaelosthege commented Jul 15, 2022

Uh oh!

OriolAbril commented Jul 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twiecki commented Jul 25, 2022

Uh oh!

maresb commented Jul 26, 2022

Uh oh!

maresb commented Jul 26, 2022

Uh oh!

twiecki commented Jul 27, 2022

Uh oh!

maresb commented Aug 25, 2022

Uh oh!

twiecki commented Aug 25, 2022

Uh oh!

maresb commented Aug 25, 2022

Uh oh!

twiecki commented Aug 25, 2022

Uh oh!

maresb commented Sep 10, 2022

Uh oh!

twiecki commented Sep 12, 2022

Uh oh!

maresb commented Sep 12, 2022

Uh oh!

OriolAbril commented Oct 13, 2021 •

edited

Loading

OriolAbril commented Jul 23, 2022 •

edited

Loading