Skip to content

feature request: exclude tests from package tarball #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jameslamb opened this issue May 24, 2021 · 2 comments
Closed

feature request: exclude tests from package tarball #340

jameslamb opened this issue May 24, 2021 · 2 comments

Comments

@jameslamb
Copy link
Member

Short Description

Would maintainers here be open to a pull request that modifies MANIFEST.in to exclude test files from this project's package artifacts?

Long Description

Today, all test files are included in the project's package artifacts.

python setup.py sdist > /dev/null
cat dask_kubernetes.egg-info/SOURCES.txt

LICENSE
...
dask_kubernetes/tests/config-demo.yaml
dask_kubernetes/tests/conftest.py
dask_kubernetes/tests/fake_gcp_auth.py
dask_kubernetes/tests/test_async.py
dask_kubernetes/tests/test_helm.py
dask_kubernetes/tests/test_kind.py
dask_kubernetes/tests/test_objects.py
dask_kubernetes/tests/test_sync.py
dask_kubernetes/tests/helm/values.yaml

I believe such a change could reduce the uncompressed size of the package by about 150kb.

main without tests
compressed 56k 48k
uncompressed 576k 424k
how I estimated this this

Place the following at the root of the repo, call it check-sizes.sh, and run sh check-sizes.sh

rm -rf dask-kubernetes.egg-info
rm -rf dist/
rm -rf __pycache__

echo ""
echo "building source distribution"
echo ""
python setup.py sdist > /dev/null
cp dask_kubernetes.egg-info/SOURCES.txt ~/DASK-K8S-SOURCES.txt
pushd dist/
    echo ""
    echo "sdist compressed size"
    echo ""
    du -a -h .
    tar -xf dask-kubernetes*.tar.gz
    rm dask-kubernetes*.tar.gz
    ls .
    echo ""
    echo "sdist uncompressed size"
    echo ""
    du -sh .
popd

Change MANIFEST.in to the following and run the script again.

recursive-include dask_kubernetes *.py
recursive-include dask_kubernetes *.yaml
recursive-exclude dask_kubernetes/tests *

include setup.py
include setup.cfg
include LICENSE
include README.rst
include requirements.txt
include MANIFEST.in
include versioneer.py

recursive-exclude * __pycache__
recursive-exclude * *.py[co]

why I think this change should be considered

Even though the package is already fairly small and the savings here are not huge, I still think this is worth doing unless there is a compelling reason to allow tests to be run from the published package artifact (instead of from cloning the repo).

  • reduces amount of data transfer for PyPI and conda channels serving this package
  • reduces the total size of any container images built with dask-kubernetes in them (and, therefore, data transfer when those images are pulled + storage footprint of those images)
  • reduces storage footprint of the package, which can be important in environments like AWS Lambda functions where storage is very limited (Suggestion: remove tests from the distribution pandas-dev/pandas#30741 (comment))
  • reduces any of the above for new packages created in the future that copy dask-kubernetes as a reference

references

Other projects that have adopted similar changes when I've proposed them:

Thanks for your time and consideration!

@jacobtomlinson
Copy link
Member

Excluding tests from distribution is a reasonably debated move. There is an opinion that being able to run tests on a package you installed via pip is a good way of testing a production environment. I was only discussing this with @Cadair on another project recently.

Given that the savings here are marginal I would be tempted not to do this. Although I would be interested to know what @jakirkham and @jrbourbeau think.

@jacobtomlinson
Copy link
Member

I'm going to close this out due to inactivity. Happy to reopen if folks comment back here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants