Skip to content

Conversation

@cimnine
Copy link
Collaborator

@cimnine cimnine commented Oct 5, 2021

Related Issues: #468, #539

New Behavior

This PR implements multi-arch builds for NetBox Docker. It does this by leveraging the idea and the work of @tobiasge to switch to the Debian distribution for NetBox Docker.

To achieve this, this PR introduces several changes:

  • Building with Buildx directly instead of the docker build compatibility shim to allow more advanced build options
  • Leveraging the GitHub Action Cache for inter-build caching for reduced build resource usage
  • Switching to Debian for a faster build and lesser dependency-mess because of musl libc.
  • Using the Buildx functionality to build images for other CPU architectures with QEMU.
  • LDAP functionalities are included in the main image. A separate -ldap image is no longer required.
  • Also push to GitHub Container Registry (ghcr.io).

Contrast to Current Behavior

  • Building the Docker image uses the legacy docker build infrastructure.
  • GitHub Actions have to build the image from scratch each time, even if nothing changed.
  • We're using Alpine Linux as basis for our image, resulting in a small image but with longer build times, as precompiled Python wheels can't be used, so they need to be compiled each time.
  • Not using Buildx would require a separate build machine for cross-compilation to amd64.
  • LDAP is a separate image now.

Discussion: Benefits and Drawbacks

Currently, we use a build system that is compatible with the regular docker build command. This does local caching for each build-step of the Dockerfile. But it does not allow build caches to be shared between systems. And since every GitHub Action starts with a fresh system, the regular build cache can not be leveraged. The new buildx build system also allows us to build Docker images for linux/arm64 instead of 'just' linux/amd64.

Switching to Debian will have more consequences for our users though. The negligible consequence, IMO, is the increased size of the Docker image by a few megabytes. The more impactful consequence is that people, which build upon our image (e.g. for plugins) will need to rewrite their Dockerfiles if they rely on Alpine Linux infrastructure themselves (i.e. if they need to install further dependencies via apk).

But the switch to Debian brings us two advantages: First, the build is much faster. This makes it easier to tweak our Dockerfile and test new ideas. It also uses less CPU resources on the shared GitHub Actions infrastructure.
The second advantage is that it was not possible to get to a working arm64 Docker image using the Alpine Linux as basis – at least not with reasonable effort.

The change to Debian is certainly a breaking change which would demand an increase of the major version number of our project.

Changes to the Wiki

Examples that rely on Alpine Linux need to be adjusted.

Proposed Release Note Entry

NetBox Docker's base image has changed from Alpine Linux to Debian

This change results in a better build performance and less incompatibilities because of musl libc. The one drawback is a slightly larger Docker image, which we believe to be a tolerable tradeoff.

NetBox Docker is available for arm64 CPUs

Now you can run your NetBox Docker on your Raspberry Pi, your NAS or your AWS Graviton instances.

Double Check

  • I have read the comments and followed the PR template.
  • I have explained my PR according to the information in the comments.
  • My PR targets the develop branch.

@cimnine cimnine added the enhancement The issue describes an enhancement that we would like to implement in the future. label Oct 5, 2021
@cimnine cimnine self-assigned this Oct 5, 2021
@cimnine cimnine force-pushed the Buildx branch 4 times, most recently from 9b70e13 to aef2e07 Compare October 5, 2021 15:11
@ryanmerolle
Copy link
Contributor

could be worth exploring https://github.com/linuxserver/docker-netbox arm build

@anthr76
Copy link

anthr76 commented Nov 12, 2021

What is left to require a review? What exactly isn’t possible about building this image on alpine?

@cimnine
Copy link
Collaborator Author

cimnine commented Nov 12, 2021

What's left to do?

The ARM bit needs testing. And currently we don't have ARM infra to test it besides emulators.

The Debian bit must be communicated well as it's a breaking change. Wiki pages will need to be updated as well. Because it's such a big change, we will probably time it so that it's release falls together with a new minor of NetBox itself.

Why the switch to Debian?

At least on my machine, the ARM build would not succeed if done with Alpine. That is AFAICT because some of the Python modules require native code. As Arch Linux uses musl-libc, most of that native code must be recompiled. On GNU libc Linuxes (which most are), many Python modules provide pre-compiled versions of their native code. This speeds up the build process significantly and also resolves some weird bugs with musl-libc and some Python modules' native code when cross-compiled for ARM.

@anthr76
Copy link

anthr76 commented Nov 12, 2021

Understood. Is there a tag availble yet or should I build this myself to test?

- Pinned unit version to prevent suprises with configuration changes
- Added psql client for manage.py dbshell
- Set LANG to "C.UTF8" to correctly read configuration files
Improved compatibility with other deployment methods.
@tobiasge
Copy link
Member

The failing arm64 tests seem to have this problem: psycopg/psycopg2#1360
Also the build time is strange. Build takes 45 minutes on GitHub but 3 minutes on my M1 Mac in a Debian Bullseye VM.

@cimnine
Copy link
Collaborator Author

cimnine commented Nov 25, 2021

Build takes 45 minutes on GitHub but 3 minutes on my M1 Mac in a Debian Bullseye VM.

I believe your VM is arm64 as well. It takes long on GitHub because there it's an amd64 architecture that emulates arm64 using QEMU.

@cimnine cimnine marked this pull request as ready for review November 26, 2021 12:34
@ppouliot
Copy link

ppouliot commented Dec 9, 2021

If you need arm native server resources for build and test, you can get access to ARM native development resources here: https://osuosl.org/services/aarch64/request_hosting/
If you send in a request I can approve it asap, FWIW.

@xvzf xvzf mentioned this pull request Dec 14, 2021
3 tasks
@xvzf
Copy link

xvzf commented Dec 14, 2021

Hey guys; I've just opened a PR with a similar goal (ARM64 docker image) but different approach, feel free to leave some feedback #664

Copy link

@kkthxbye-code kkthxbye-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the build requirements (libpq-dev and libssl-dev) and adding psycopg2==2.9.3 to requirements-container.txt fixes the psycopg issue for me. Built and ran all tests on my M1 mac using target: linux/arm64/v8.

postgresql-13 \
python3-dev \
python3-pip \
python3-venv \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python3-venv \
python3-venv \
libpq-dev \
libssl-dev \

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll test as well shortly. @cimnine can you please make the changes to test via CI?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you remove the psycopg2-binary==2.9.3 line from the Netbox requirements.txt or is psycopg2==2.9.3 simply added as an additional dependency?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't need to remove it from the netbox requirements.

@kkthxbye-code
Copy link

Do you guys need any help with this @tobiasge / @cimnine ? Moving away from Alpine would be nice.

@cimnine cimnine closed this Jun 9, 2022
@cimnine cimnine reopened this Jun 9, 2022
@cimnine
Copy link
Collaborator Author

cimnine commented Jun 9, 2022

Sorry, wrong button clicked 😔

Do you guys need any help with this?

If you have time to spend on it, sure. This PR here is now outdated. So decide first if you want to start your work off of this one or if you just take it for reference and start again from develop.

Also I believe this PR does not yet build truly multi-arch images. It builds an image for x86 and one for arm but no combined manifest.

@tobiasge
Copy link
Member

tobiasge commented Jun 9, 2022

The current develop branch can be build on ARM64 and all tests pass. So we don't need to switch do Debian if we want an ARM64 image.
The problem with psycopg2 happens only in the Debian based image.

@kkthxbye-code
Copy link

kkthxbye-code commented Jun 9, 2022

@tobiasge - That's fair, I'm just firmly in the camp of Alpine pretty much never being worth the hassle, especially for python projects as manylinux wheels are still in short supply.

I don't mind giving it a shot make the PR up-to-date, but if the goal is just ARM support, I don't mind just forking the project based on the debian parts of this PR to fit my needs.

The only other maintained image seems to be linuxserver/netbox, which is also Alpine sadly.

@tobiasge
Copy link
Member

tobiasge commented Jun 9, 2022

@tobiasge - That's fair, I'm just firmly in the camp of Alpine pretty much never being worth the hassle, especially for python projects as manylinux wheels are still in short supply.

Isn't that just increasing the build time? And that shouldn't have a very big impact or do you build the image that often?

I don't mind giving it a shot make the PR up-to-date, but if the goal is just ARM support, I don't mind just forking the project based on the debian parts of this PR to fit my needs.

The only other maintained image seems to be linuxserver/netbox, which is also Alpine sadly.

I'm not opposed to a Debian image (I did investigate it also), I just want too understand what the arguments against Alpine are.

@kkthxbye-code
Copy link

Isn't that just increasing the build time? And that shouldn't have a very big impact or do you build the image that often?

It's a small annoyance and a waste of resources.

I'm not opposed to a Debian image (I did investigate it also), I just want too understand what the arguments against Alpine are.

Spotty wheel support and musl is just plain slower than glibc. cimine also listed his reasons in the PR text.

It's probably just personal preference, but I value build time and performance more than a tiny bit of saved storage. Our other internal images use debian as a base also.

I think cimine adequately explained the reasoning for the PR and again, I'm really just looking for clarification regarding the plan so I can decide if forking for my own use is necessary.

@tobiasge
Copy link
Member

tobiasge commented Jun 9, 2022

Yes, I think these are valid arguments and we still have this #539 open.
We should proceed with the switch to Debian but probably split this up: First switch to Debian and then get the build to run on ARM64. Might be more manageable.

@tobiasge
Copy link
Member

I have rebased my branch debian-based onto the current develop branch and added psycopg2==2.9.3 as an additional dependency as suggested by @kkthxbye-code.

With the changes on this branch I could build and successfully test the image on the following systems:

Linux debian 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux
Linux debian 5.10.0-14-arm64 #1 SMP Debian 5.10.113-1 (2022-04-29) aarch64 GNU/Linux

On my branch the CI workflow is not changed yet. I think we can do this in a second step.
@kkthxbye-code Feel free to test the changes

@kkthxbye-code
Copy link

@tobiasge - Tested a bunch on x86_64 where it works great like expected. I'll test on my m1 macbook air when I get the chance.

Thanks for taking the time!

@tyler-8
Copy link

tyler-8 commented Jun 13, 2022

I have rebased my branch debian-based onto the current develop branch and added psycopg2==2.9.3 as an additional dependency as suggested by @kkthxbye-code.

FWIW I built and tested this branch on a Mackbook M1 Pro and it works great!

@ryanmerolle
Copy link
Contributor

With tests looking good, what are our next steps?

@tobiasge tobiasge mentioned this pull request Jun 15, 2022
3 tasks
@tobiasge
Copy link
Member

tobiasge commented Jun 15, 2022

I have published images based on my PR #775 in this Docker organisation for further testing.

@ryanmerolle
Copy link
Contributor

Just an update. Everything works well for me on my m1 macbook. Thanks @tobiasge

@anthr76
Copy link

anthr76 commented Jun 30, 2022

+1 currently running @tobiasge image in my cluster and it's working well :)

@tobiasge tobiasge mentioned this pull request Jul 13, 2022
3 tasks
@tobiasge
Copy link
Member

Superseded by #797

@tobiasge tobiasge closed this Jul 15, 2022
@cimnine cimnine deleted the Buildx branch October 13, 2022 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement The issue describes an enhancement that we would like to implement in the future.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants