Skip to content

Improve dependency management #1979

@ZanSara

Description

@ZanSara

The current handling of dependencies is quite monolithic: users must install them all regardless of the subset of features they want to use. We should make Haystack more modular at install time.

Options

Nowadays there are several ways to properly handle dependency groups:

  • several requirement.txt files: quite old fashioned by now and a bit harder to manage
  • extras_require in setup.py: "traditional" way, safe and widely used
  • pyproject.toml: the new way, as recommended by PEP517 and PEP660.

Proposed dependency groups

  • minimal: basic Haystack on CPU with one single document store (inMemory maybe)
  • gpu: for running Haystack on GPU
  • rest: install also the REST server API deps
  • ui: install Streamlit deps
  • demo: rest + ui
  • ci: for GitHub runners
  • win: for Windows installs (if possible)
  • colab: to workaround Colab specific issues when necessary
  • One group for each document store
  • all_doc_stores: install all possible dependency from document stores
  • test for the test dependencies
  • docs: for building documentation
  • code: black, linter and possible extra tools if/when we introduce them
  • all (or dev): complete dependency list for development and contributing. Includes all of the above.

We can also consider adding smaller groups for special components with exotic dependencies, like crawler, ocr, etc.

Default install

It's up to debate what the default install (pip install haystack) should look like.

The important point is that the dependencies that are installed in this case must be marked as mandatory. This at least is the case for extras_require in setup.py, and might have changed in pyproject.toml. If it's the case, the default install should be effectively a minimal install. For example, if we include GPU deps in this group, they will become mandatory, and having a pure CPU install will be impossible.

I will investigate the options and update this section with new information.

Related issues

Related to #1291, #1716, #1826, #1806

Closes #1070

Next steps

  • Learn more about what's currently possible with pyproject.toml and whether all of our dependencies can actually work with it. As of last year that were still some issues with large libraries that needed complex build steps.
  • Finalize dependency groups list
  • Define what a default install should look like
  • Investigate how to properly handle failed imports for unmet dependencies
  • Fix dependency related issues (like Improve Colab setup experience by simplifying dependencies #1806)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions