-
Notifications
You must be signed in to change notification settings - Fork 10
feat: unbloat, reduce image size #70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@MQ37 Could you give us links to the build as well as some runs (STANDBY and STANDALONE) using it? |
Sure 👍 Build link: https://console.apify.com/actors/B2VM9FhWyxLEMb7tm/builds/1.0.25/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is really cool. Reducing the image size by 1GB without reducing functionality is impressive. Thank you.
Have you checked if this makes sense to do for the Apify Docker base images? |
Since it uses distroless image where we need to copy all the requirements I don't think it makes sense and is viable to make this a base image for Actors - they can use some specific libs or different versions. |
Yeah, this is great. It will speed up start significantly (when image is not cached) I think it could make sense to use it as a base image for certain use cases — or to include it in the templates as an example. Assuming there isn’t some catch I’m not seeing right now? @metalwarrior665 can you think of any? We definitely need to test it properly before releasing it in the RAG Web Browser. |
This should definitely go through tooling/platform review as the base Dockerfiles are cached/preloaded in some way. Btw: When we dropped image size in Google Maps from 2 TB to 400 MB (browser -> cheerio), we saw no improvement in startup time. I think the caching just takes care of that. So I would measure what benefit we actually want - usually startup times and build times. |
just found out that when I tested I probably tested locally with the playwright which does not use the docker image and when copyting from builder image I forgot to copy the browser with it's requirements 🤦 When including the browser and all the libs we can realistically at maximum save ~300 MB which I don't know is worth the overhead (we would have to solve loading of the dynamically linked libraries). But we can look into the distroless images for other, simpler use cases for node only Actors. |
Taking this back, hacked it to work (its really hacky) the image is ~600 MB and playwright works 👍 More than 300 MB but way less than original 1.3 GB. |
Build: https://console.apify.com/actors/B2VM9FhWyxLEMb7tm/builds/1.0.37/log Runs |
Cool! Could you still make a post in #product-dev-tools and present this achievement before we merge? We can take this Actor as PoC of the distroless approach but it should be approved by the Tooling team. There might be some assumptions we have about our base Dockerfiles |
@MQ37 If you look at the first 2 log lines of your runs, the Docker pull takes 6+ seconds which is terrible. Usually it takes between 1-2 seconds. You can try a few more tests but I think this is cache miss and unless we make your distroless approach a new standard, we have to pause this. |
Yes, It would mean that the image is not cached. If you ran it more often, then the time should improve. Since the rag web browser is used 2-5k runs per day, we need to caution we non-standard changes. |
Did a few more tests and the pull is way faster when I hit the cache. But I noticed the startup time, time from "Starting Docker container." to the printed system info in logs, is slower/unstable (higher std). Which is what we want to optimize for.
|
We need to optimize from Actor start to system info but these parts will depend on multiple teams. Platform team has been working on optimizing the base Dockerfiles for years so you will definitely need to talk with them. The problem is a bit of a catch-22 as they will not want pre-cache images that are not the base for everybody. And even if we would change it for everyone, there will still be old versions and people not upgrading. But I'm just speculating, the best is to talk to them. And then tooling team can probably influence how fast Node.js or Python process starts |
Talked to @jirimoravcik, and the first thing is we need more test runs (thousands) to get statistically significant results. Jirka tried a few runs, and they seem okay to him - https://apify.slack.com/archives/CD0SF6KD4/p1743774919039049?thread_ts=1743622813.549859&cid=CD0SF6KD4. I already have this on my to-do list, so I will run the tests when I have time and report the results. |
Wrote an Actor runtime benchmark and tested current Executed 500 + 2000 runs for each Actor (master and distroless) and measured times from Based on the results the distroless image performs a bit better in Actor run settings:
Master build results:
Distroless build results:
|
@MQ37 Thanks for the detailed benchmark! It would be great if you could summarize the results a bit more clearly, going through all those tables takes extra effort for anyone just trying to get the main takeaway :) I don’t think we need to show all 500 runs, right? This looks like just a subset of the full 2k runs. That said, when looking at the data, the key metric seems to be
If we wanted to be rigorous, we’d run some statistical tests like a t-test, but based on the averages and standard deviations, I don’t see a compelling reason to introduce a new image and diverge from the standard Apify base image. Sorry. It would be nice to see the full distribution and p95, p99, but based on the code, it looks like you'd need to rerun the entire experiment again. I don't think it is worth it. |
The important part is to test these runs spaced an hour or longer so you are not hitting the same Actor run worker with already cached images. Because we pre-cache the standard Dockerfiles, master will be much more cached than any non-standard builds (try to build an Actor in Rust and watch it to take 20 seconds to start the container :) so my assumption is that the numbers will get significantly worse for Distroless (something we saw in earlier tests). Also the image size does not seems to be a determining factor for real runs. When I did a test of Cheerio (± 400 MB) vs browser (± 2 GB), I didn't see any difference in Startup time. So it is much more to do with how our platform optimizes these than the images themselves. Our average start used to be like 5+ seconds (like 3 years ago) and almost all the improvement to current <2 sec is just optimizing the platform (I think mostly caching the right things, some filesystem changes too), rather than images. I think you make a very good case to eventually use this as a base image for most Actors. At that point we would pre-cache it and the problem with cold starts I mentioned above would not exist. But that will be a long road so you should definitely go through this with the platform & tooling teams. |
After discussing with @jirispilka we decided it is not worth it now - maybe in the future. I will keep this as my personal TODO to discuss with the platform and tooling team. |
Yeah, this is just one Actor, but your approach could be a significant improvement for all of Apify, so let's not throw it out. |
Closes #72
These changes bring image size from ~1.4 GB to ~600 MB.