Skip to content

Rethink bundling of requirements #5074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
OriolAbril opened this issue Oct 13, 2021 · 17 comments
Open

Rethink bundling of requirements #5074

OriolAbril opened this issue Oct 13, 2021 · 17 comments

Comments

@OriolAbril
Copy link
Member

OriolAbril commented Oct 13, 2021

I want to take this opportunity and rethink all these conda environments we have. I understand they are helpful and convenient to create our dev environments and they are also used for ci. However, at least in my case, the extra work and burden on updating 4 (8 now) nearly equivalent files with a pre-commit check that seems to only partially work (i.e. pandas has a lower bound in 3.7 and 3.8 but not in 3.9 nor in requirements-dev.txt) is clearly larger than the benefit I get. Moreover, issues with these files, installation and pre-commit blocked #5060. #5062 was only needed because of #5060 being blocked.

I think there are several issues at hand regarding all this.

  • No single source of truth. From what I understand, the pre-commit script somehow takes all conda env files and from them generates the pip file. However, it seems like conda files don't need to be equivalent between them (see for example from add 404 page to docs #5057 how nbsphinx was on some conda files but not all of them, as well as several versions not matching between files).
  • Overly bloated and non-specific dependency files. We have 2/3 (still not completely understand the new test file but it seems like a step in the right direction) types of requirements files, library requirements (user facing), dev and test ones. But we use those files to do much more than 3 types of things. We generate local envs for devs, we set up CI envs in github acions for testing, we set up the env on rtd to build documentation... So here are some things that we are doing and I think we should avoid as they also make our ci more brittle and prone to conda finding version conflicts:
    • We are installing sphinx and sphinx extensions everywhere, even on github actions
    • We are installing pytest on rtd
    • We are installing libraries like watermark or sphinx-autobuild also everywhere, but these are only used locally! watermark is only needed when writing/running example notebooks but not to build the docs that contain such notebooks. sphinx-autobuild is only needed to get an auto-updating preview of the docs locally.
  • Most of us (or maybe even all of us) have no idea about what some of these dependencies are for. I for example have no idea what cachetools is for. I think that this is both issue and a consequence of the previous two, but I think it's important to put that on the table, because in my opinion, unless we make an effort in explaining all that in the developer guide and keep it up to date it will only get worse.

Ideas for possible solutions:

  • Being more strict about CI. I have the feeling that some PRs are blocked when only pre-commit fails when others are green-lighted with failing tests and/or rtd job. I haven't written much tests myself here and I imagine testing distributions is inherently hard, but I think we need some way of getting rid of flaky tests or grouping them in a "potentially flaky" job, any idea here welcome. I think that having flaky tests makes ourselves open to the possibility of "maybe this is a flaky test too, I don't recall changing anything that should affect this test" or similar reasonings. On that end we might also benefit from skipping jobs either manually or automatically. i.e. if we only update the readme we don't need to run any ci job but if we modify requirements we should run all jobs even if the library modified isn't directly used everywhere as the change could generate a corrupted environment.
  • Being more strict about reviews. In a similar manner, I have a feeling that some PRs are merged without approving reviews or with approving reviews even though there were still open questions in the PR comments.
  • Split requirements into task related chunks. This by design eliminates the 1 line to dev environment/single file to serve all, so we'll need to install 2-4 files, I'm also not sure how that would fit with all the pre-commit scripts but I'm sure we'll figure it out, maybe even write a helper script to keep creating dev envs a single line operation. Here are some ideas:
    • requirements.txt -> requirements like we have now for pymc
    • requirements-optional.txt -> optional requirements for pymc, we could also update setup.py so that running pip install pymc[all] installs both requiremetns and requirements-optional
    • requirements-test.txt -> requirements for testing
    • requirements-docs.txt -> requirements for building the docs
    • requirements-write.txt -> requirements for writing example notebooks, see https://github.com/pymc-devs/pymc-examples/blob/main/requirements-write.txt
    • others?
  • Documenting everything. Explain on the developer guide/contributing guide when should each file be used plus maybe also extra recommendations for specific local env only tools (doc related examples of that would be sphinx-autobuild or sphobjinv libraries)

cc @Sayam753

@twiecki
Copy link
Member

twiecki commented Oct 14, 2021

I realized last night that we could easily cut that down to 5 as we only need a single dev environment + testing envs.
CC @MarcoGorelli

@michaelosthege
Copy link
Member

Yes, we should be more strict about the CI. A red ❌ CI on main should be an immediate call to action to get it fixed.
Something like the past 6 days with a ❌ on main should never happen again.

On splitting the dependencies, I think we should try to keep it with two requirements.txt files. A basic one and one with all testing & dev & docs dependencies relevant within the scope of this repo. The pymc-examples repo has its own.

I think the third element is a restructuring of the CI workflows:

  • Get all jobs into the same workflow YAML
  • Re-think our matrix strategy: Some things should be tested across OSes (installation, floatX) and for others we might want to test different numpy/scipy versions but the OS doesn't matter (rng/logp stuff).

@ricardoV94
Copy link
Member

Let's start by cutting down the 3 dev envs to a single one?

@ricardoV94
Copy link
Member

For the tests, it seems we can easily specify the python version outside of the conda environment file: https://github.com/conda-incubator/setup-miniconda#usage-examples

@michaelosthege
Copy link
Member

@OriolAbril would you say the simplifications made by #5911 close this issue?

If not, which are the most important items that we can track in separate smaller issues?

@OriolAbril
Copy link
Member Author

OriolAbril commented Jul 23, 2022

It still feels less than ideal that to bump Aesara or ArviZ we need to update over 4 files.

If possible, I think it would be best to have something like: env-base.yml, env-test.yml, env-docs.yml, windows-extras.yml and then in github actions (or install instructions do something like):

conda env create -f conda-envs/env-base.yml  # has pymc dependencies
conda activate pymc-dev
conda env update --file env-test.yml  # for testing, has *only* test dependencies
# instead of duplicating the dependencies and adding the test ones
conda env update --file env-docs.yml # for docs, likewise, can be concatenated

I don't know much for windows, but it might be needed to initialize the env with windows-extras then update it twice with dependencies and specific test or docs dependencies.

@twiecki
Copy link
Member

twiecki commented Jul 25, 2022

I like that. Although I think we already improve the current situation without that. As all the compiler and blas packages are handled in aseara now (thanks to @maresb) I think we could just have a package for test and one for docs.

@maresb
Copy link
Contributor

maresb commented Jul 26, 2022

In order to move towards a single source of truth, could we generate requirements.txt from environment-dev.yml like we do with requirements-dev.txt?

I also like @OriolAbril's idea of breaking things up into .env files with independent requirements.

@maresb
Copy link
Contributor

maresb commented Jul 26, 2022

Regarding the pip: dependencies in environment-dev.yml, polyagamma already exists as a Conda-Forge package, and sphinx-remove-toctrees will exist soon.

@twiecki
Copy link
Member

twiecki commented Jul 27, 2022

Is there an env file for building docs? I can't find it.

Without that, I think we can merge all current ones into one env file.

@maresb
Copy link
Contributor

maresb commented Aug 25, 2022

Regarding the pip: dependencies in environment-dev.yml, polyagamma already exists as a Conda-Forge package, and sphinx-remove-toctrees will exist soon.

Done in #6060

Is there an env file for building docs?

Yes, we're using environment-dev.yml for the docs.

@twiecki
Copy link
Member

twiecki commented Aug 25, 2022

We can probably unify all our env files at this point, yes?

@maresb
Copy link
Contributor

maresb commented Aug 25, 2022

Possibly yes, but it's not so clear to me yet. It would require sitting down and tracking down what's safe to use, and also leaning on the implicit dependencies of Aesara to avoid platform-specific issues. But I think it's worth a try.

To clarify, the goal would be to have a single env file which combines dev, test, and docs dependencies, and works on all platforms?

@twiecki
Copy link
Member

twiecki commented Aug 25, 2022

To clarify, the goal would be to have a single env file which combines dev, test, and docs dependencies, and works on all platforms?

The goal is to simplify the env files as much as possible, but not more than that ;).

@maresb
Copy link
Contributor

maresb commented Sep 10, 2022

I have been thinking about this, and I've come to the conclusion that the available tooling is very close, but not quite there yet. Rather than throw together some flaky scripts to close the gap, I think the biggest impact would be contributing to conda-lock, and the maintainer there is really responsive. That way, PyMC (plus a few other things I'm working on) would be my motivating use case.

That said, I don't have any time horizon on this, so if anyone wants to contribute some quick-wins in the meantime, then please go ahead.

@twiecki
Copy link
Member

twiecki commented Sep 12, 2022

@maresb That sounds like a fine approach to me. Our current setup is totally fine and works, so there's no rush and we can just wait for the proper tools.

Having said that, I suppose I'm a bit unsure why we need OS specific envs in the first place as we're noarch now for PyMC.

@maresb
Copy link
Contributor

maresb commented Sep 12, 2022

@twiecki, great point! That is indeed a quick-win which we could easily tackle right away.

I'm probably not the best person to implement this since I don't have Windows, but one possible way forward is to make incremental changes to windows-environment-*.yml and environment-*.yml, until they have the same contents. Also we should preferably add comments to explain what the individual dependencies are doing and why we need them. (That should make future maintenance much easier, since currently I don't know what might break if I modify something.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants