-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Significant build regressions on swift:6.0-noble
compared to 5.10-noble
#76555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some notes, in no particular order (will update if I think of more):
|
To be clear:
It's very sad to see the [Linux] build situation constantly getting worse despite us asking for faster and less-RAM-hungry builds for ages. Anyways, let's head to the benchmarks I did. I likely ran 500+ CI jobs in the past 48 hours ... Tests CI
(Side note: I didn't know using higher RAM could hurt?! I don't think it's a machine-type problem since the deployment builds below show the expected behavior of some performance improvements when having access to more RAM.) Analyzing the results (Excluding the bigger
Deployment CI
Only notable change when I moved our app to Swift 6 compiler, is that we have 3 executable targets which Swift was throwing errors about e.g. using You may ask: "Didn't you say you needed 8x larger machine to run the tests? How come this time they ran even on the same machine as you had before when using So: Worth mentioning, when the build is stuck, I consistently see a sequence of logs like this, containing [6944/6948] Wrapping AST for MyLib for debugging
[6946/6950] Compiling MyExec Entrypoint.swift
[6947/6950] Emitting module MyExec
[6948/6951] Wrapping AST for MyExec for debugging
[6949/6951] Write Objects.LinkFileList |
This almost starts to sound like a recurrence of the infamous linker RAM usage problem due to huge command lines with repeated libraries. @al45tair Is there any chance we're still failing to deduplicate linker command lines fully? |
swift:6.0-noble
swift:6.0-noble
compared to 5.10-noble
Yes, I pinged you on that last month, but never got a response. A fix was just merged into @jmschonfeld or @shahmishal, can that fix be prioritized to get into the next patch release? |
@MahdiBM, thanks for all the build info. Do you do any CI builds of the 6.0 snapshot toolchains before the final release? That would help find and stop build regressions like this when they happen, rather than being surprised on the final release. If you can, I'd like to know how an earlier 6.0 July 19 snapshot toolchain build for jammy does on these same CI runs of yours. That might help figure out the regression, particularly if you compare it to the next July 21 build of the 6.0 toolchain. |
@finagolfin this is an "executable" work project, not a public library, so we don't test on multiple Swift versions.
We just use docker images in CI. To be clear, the image names above are exact Docker image names that we use (added this explanation to the comment). I haven't tried or figured out manually using nightly images, although I imagine I could just use swiftly to set up the machine with the specific nightly toolchain and there should be little problems. It will make the different benchmarks diverge a bit though in terms of environment / setup and all. |
Another visible issue is the Not sure where that comes from. Any ideas? |
I would guess that even on 5.10, the |
My guess is that difference comes from the updated glibc in |
I don't know. In my Android CI, the latest Swift release builds the same code about 70-80% faster than the development snapshot toolchains. But you'd be looking for regressions in relative build time, so those branch differences shouldn't matter.
I'd simply build with the snapshots of the next release, eg 6.1 once that's branched, and look for regressions in build time with the almost-daily snapshots.
The CI tags snapshots and provides official builds a couple times a week: I'd set it up to run whenever one of those drops.
I don't use Docker, so don't know anything about its identifiers, but I presume the 6.0 snapshot tag dates I listed should identify the right images.
Not yet. The fix was added to trunk a couple weeks ago, so you could try the Sep. 4 or latest 6.1 snapshot build with it and compare to the Aug. 29 build without it. You may also want to talk to @ktoso and the SSWG about what kind of toolchain benchmarking exists to catch these issues on linux and what needs to be done to either start or augment it. |
If you can log into the build machine and watch memory usage with |
@tbkka I can ssh into the containers since RunsOn provides such a feature, but as you already noticed, I'm not well-versed in compiler/build-system workings. I can use top but not sure what or how to derive conclusions about where the problem is. It does appear @gwynne was on point though, about linker issues. |
@finagolfin I imagine SSWG is already aware of these long standing problems and I expect they have already communicated their concerns in the past few years, just probably haven't managed to get it up in the priority list of the Swift team. I've seen some discussions in the public places. Even if there were no regressions, Swift's build system looks to me - purely from a user's perspective with no knowledge of the inner workings - pretty behind (dare I say, bad), and one of the biggest pain points of the language. We've just gotten used to our M-series devices build things fast enough before we get way too bored. Though I'm open to help in benchmarking things. I think one problem is that we need a real, big, and messy project like what most corporate projects are, so we can test things on real environments. |
@MahdiBM, I have submitted that linker fix for the next 6.0.1 patch release, but the branch managers would like some confirmation from you that this is fixed in trunk first. Specifically, you should compare the Aug. 29 trunk 6.1 snapshot build from last month to the Sep. 4 or subsequent trunk builds. |
Moving the thread about benchmarking the linker fix here, since it adds nothing to the review of that pull. I was off the internet for the last seven hours, so only seeing your messages now.
Maybe you can give some error info on that.
Ideally, you'd build against the same commit of your codebase as the test runs you measured above.
Hmm, this is building inside the linux image? A quick fix might be to remove that However, this seems entirely unrelated to the toolchain used: I'd try first to build the same working commit of your codebase that you used for the test runs you measured above. |
That's not possible for multiple reasons. Such as the fact that normally i use Swift Docker images, but here i need to install specific toolchains which means i need to use the ubuntu image.
Yes. That's how GitHub Actions works. (ubuntu jammy)
Not sure how that'd be helpful. Current commit is close enough though. Those tests above were made in a different environment (Swift Docker images, release images only) so while i trust that you know better than me about these stuff, i don't understand how you're going to be able to properly compare the numbers considering i had to make some adjustments. |
I figure you know it builds to completion at least, without hitting all these build errors.
There are Swift Docker images for all these snapshot toolchains too, why not use those? Basically, you can't compare snapshot build timings if you keep hitting compilation errors, so I'm saying you should try to reproduce the known-good environment where you measured the runs above, but only change one ingredient, ie swapping the 6.0 Docker image for the snapshot Docker images. If these build errors are because other factors, like your Swift project, are changing too, that should fix it. If that still doesn't build, I suggest you use the 6.0 snapshot toolchain tags given, as they will be most similar to the 6.0 release, and show any build error output for those. If you can't get anything but the final release builds to compile your codebase, you're stuck simply observing the build with some process monitor or file timestamps. If linking seems to be the problem, you could get the linker command from the verbose I took a look at some linux CI build times of swift-docc between the 6.0.0 and 6.0 branches, ie with and without the linker fix, and didn't see a big difference. I don't know if that's because they have a lot of RAM, unlike your baseline config that showed the most regression. |
@finagolfin but how do i figure out the hash of the exact image that relates to the specific main snapshots? I tried a bunch to fix the nio errors with no luck: https://github.com/MahdiBM/swift-nio/tree/mmbm-no-cniodarwin-on-linux This is not normal, and not a fault of the project. This is not the first time i'm building the app in a linux environment. |
It also complained about CNIOWASI, as well as CNIOLinux.
|
Hmm, looking it up now, I guess you can't. As I said, I don't use Docker, so I was unaware of that. My suggestion is that you get the 6.0 Docker image and make sure it builds some known-stable commit of your codebase. Then, use that same docker image to download the 6.0 snapshots given above, like the one I linked yesterday, and after unpacking them in the Docker image, use them to build your code instead. That way, you have a known-good Docker environment and source commit, with the only difference being the Swift 6.0 toolchain build date. The Docker files almost never change, so only swapping out the toolchain used inside the 6.0 image should minimize the differences. |
@finagolfin Good idea, didn't think of that, but still didn't work. For the reference: CI Filename: test build
on:
pull_request: { types: [opened, reopened, synchronize] }
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
unit-tests:
strategy:
fail-fast: false
matrix:
snapshot:
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-19-a
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-21-a
machine:
- name: "medium" # 16gb 8cpu c7i-flex
arch: amd64
- name: "large" # 32gb 16cpu c7i-flex
arch: amd64
- name: "huge-stable-arm" # 128gb 64cpu bare metal c7g
arch: arm64
runs-on:
labels:
- runs-on
- runner=${{ matrix.machine.name }}
- run-id=${{ github.run_id }}
timeout-minutes: 60
steps:
- name: Check out ${{ github.event.repository.name }}
uses: actions/checkout@v4
- name: Build Docker Image
run: |
docker build \
--network=host \
--memory=128g \
-f SwiftDockerfile \
-t custom-swift:1 . \
--build-arg DOWNLOAD_DIR="${{ matrix.snapshot }}" \
--build-arg TARGETARCH="${{ matrix.machine.arch }}"
- name: Prepare
run: |
docker run --name swift-container custom-swift:1 bash -c 'apt-get update -y && apt-get install -y libjemalloc-dev && git config --global --add url."https://${{ secrets.GH_PAT }}@github.com/".insteadOf "https://github.com/" && git clone https://github.com/${{ github.repository }} && cd ${{ github.event.repository.name }} && git checkout ${{ github.head_ref }} && swift package resolve --force-resolved-versions --skip-update'
docker commit swift-container prepared-container:1
- name: Build ${{ matrix.snapshot }}
run: |
docker run prepared-container:1 bash -c 'cd ${{ github.event.repository.name }} && swift build --build-tests'
Modified DockerfileFROM ubuntu:22.04 AS base
LABEL maintainer="Swift Infrastructure <[email protected]>"
LABEL description="Docker Container for the Swift programming language"
RUN export DEBIAN_FRONTEND=noninteractive DEBCONF_NONINTERACTIVE_SEEN=true && apt-get -q update && \
apt-get -q install -y \
binutils \
git \
gnupg2 \
libc6-dev \
libcurl4-openssl-dev \
libedit2 \
libgcc-11-dev \
libpython3-dev \
libsqlite3-0 \
libstdc++-11-dev \
libxml2-dev \
libz3-dev \
pkg-config \
tzdata \
zip \
zlib1g-dev \
&& rm -r /var/lib/apt/lists/*
# Everything up to here should cache nicely between Swift versions, assuming dev dependencies change little
# gpg --keyid-format LONG -k FAF6989E1BC16FEA
# pub rsa4096/FAF6989E1BC16FEA 2019-11-07 [SC] [expires: 2021-11-06]
# 8A7495662C3CD4AE18D95637FAF6989E1BC16FEA
# uid [ unknown] Swift Automatic Signing Key #3 <[email protected]>
ARG SWIFT_SIGNING_KEY=8A7495662C3CD4AE18D95637FAF6989E1BC16FEA
ARG SWIFT_PLATFORM=ubuntu
ARG OS_MAJOR_VER=22
ARG OS_MIN_VER=04
ARG SWIFT_WEBROOT=https://download.swift.org/development
ARG DOWNLOAD_DIR
# This is a small trick to enable if/else for arm64 and amd64.
# Because of https://bugs.swift.org/browse/SR-14872 we need adjust tar options.
FROM base AS base-amd64
ARG OS_ARCH_SUFFIX=
FROM base AS base-arm64
ARG OS_ARCH_SUFFIX=-aarch64
FROM base-$TARGETARCH AS final
ARG OS_VER=$SWIFT_PLATFORM$OS_MAJOR_VER.$OS_MIN_VER$OS_ARCH_SUFFIX
ARG PLATFORM_WEBROOT="$SWIFT_WEBROOT/$SWIFT_PLATFORM$OS_MAJOR_VER$OS_MIN_VER$OS_ARCH_SUFFIX"
RUN echo "${PLATFORM_WEBROOT}/latest-build.yml"
ARG download="$DOWNLOAD_DIR-$SWIFT_PLATFORM$OS_MAJOR_VER.$OS_MIN_VER$OS_ARCH_SUFFIX.tar.gz"
RUN echo "DOWNLOAD IS THIS: ${download} ; ${DOWNLOAD_DIR}"
RUN set -e; \
# - Grab curl here so we cache better up above
export DEBIAN_FRONTEND=noninteractive \
&& apt-get -q update && apt-get -q install -y curl && rm -rf /var/lib/apt/lists/* \
# - Latest Toolchain info
&& echo $DOWNLOAD_DIR > .swift_tag \
# - Download the GPG keys, Swift toolchain, and toolchain signature, and verify.
&& export GNUPGHOME="$(mktemp -d)" \
&& curl -fsSL ${PLATFORM_WEBROOT}/${DOWNLOAD_DIR}/${download} -o latest_toolchain.tar.gz \
${PLATFORM_WEBROOT}/${DOWNLOAD_DIR}/${download}.sig -o latest_toolchain.tar.gz.sig \
&& curl -fSsL https://swift.org/keys/all-keys.asc | gpg --import - \
&& gpg --batch --verify latest_toolchain.tar.gz.sig latest_toolchain.tar.gz \
# - Unpack the toolchain, set libs permissions, and clean up.
&& tar -xzf latest_toolchain.tar.gz --directory / --strip-components=1 \
&& chmod -R o+r /usr/lib/swift \
&& rm -rf "$GNUPGHOME" latest_toolchain.tar.gz.sig latest_toolchain.tar.gz \
&& apt-get purge --auto-remove -y curl
# Print Installed Swift Version
RUN swift --version
RUN echo "[ -n \"\${TERM:-}\" -a -r /etc/motd ] && cat /etc/motd" >> /etc/bash.bashrc; \
( \
printf "################################################################\n"; \
printf "# %-60s #\n" ""; \
printf "# %-60s #\n" "Swift Nightly Docker Image"; \
printf "# %-60s #\n" "Tag: $(cat .swift_tag)"; \
printf "# %-60s #\n" ""; \
printf "################################################################\n" \
) > /etc/motd |
To be clear, by "didn't work" I mean that I'm getting exactly the same errors. |
Tried the 6.0 snapshots, they complain about usage of |
Does simply building your code with the Swift 6.0 release still work? If so, I'd try to instrument the build to figure out the bottlenecks, as I suggested before. In particular, if you're building a large executable at the end, that might be taking the most time. As I said yesterday, you could try dumping all the commands that |
Of course there is no problem on the released Swift 6 😅. This whole issue is about CI slowness. We run CI on push, PR, etc... and they've all passed. |
@finagolfin snapshot:
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-19-a
- swift-6.0-DEVELOPMENT-SNAPSHOT-2024-07-21-a
machine:
- name: "medium" # 16gb 8cpu c7i-flex
arch: amd64
- name: "large" # 32gb 16cpu c7i-flex
arch: amd64
- name: "huge-stable-arm" # 128gb 64cpu bare metal c7g
arch: arm64 The results are only marginally different: Ignore that it says "unit-tests". It's only a |
I just reran 2 of the most important CIs in terms of knowing the situation, in the old repo that all the benchmarks above are related to. On a 2cpu 8ram machine it appears noble images are marginally faster than the jammy images. This confirms the build regressions, at least those I reported, are gone. |
Okay nevermind, I actually forgot that there were 2 issues reported: The former is fixed. The latter is not. Swift 5.10 noble compiles in 13m 36s while 6.0 noble takes 19m 6s ( @eeckstein I don't have the permission, so please reopen the issue 🙂. |
I also wonder that what fixed the jammy vs noble discrepancy was that jammy regressed down to noble. Not that noble got any better. |
Swift Jammy 6.0.0 took 18m46s which is close enough to the 19m to be ignorable .... I guess noble must have actually gotten better and this is not Jammy regressing? Or maybe just a bit of regression, but mostly still noble getting better. |
Do you have a Swift repo with a CMake config that you can build and check the ninja timestamps for? That may be a way to get some actual timing data. @dschaefer2, did you ever end up adding a way to dump timestamps for each build step run by SwiftPM? That would be useful to hunt down build regressions like this. |
Or build without parallelism ( |
@finagolfin @eeckstein Can you let me know what you'd do to debug this, if you were me? |
@finagolfin about cmake, no. |
I've never tried to track down such a compiler speed regression, but what you can try is to run your build with each compiler in verbose mode with parallelism turned off
I don't know if the two build tools will stick to the same build order between versions, but you can use the verbose build log and the object file timestamps to find which commands are regressing the most. Once you have the slower compilation command, there is some doc on what flags to use to try and further track down compiler performance issues. Try that out and let us know what you find. 😸 |
Or if, as I suspect (without evidence thus far), it's a linker time issue, the timestamp gap will show up on the final build product. |
Why do you still suspect the final linker run, considering 6.0.3 now de-duplicates most of the Swift runtime library names passed to the linker? |
TBH, because I'm always suspicious of the linker when build time is involved. As I said, I have no evidence at this time to back up that suspicion. |
Running some CIs with command
It appears the compiler regressions don't react to more CPU or RAM. Looks like the compiler is not doing multithreaded stuff which makes sense I guess with Will continue the investigation. |
Ok some more info. Tried the same CI in vapor/penny-bot to see if I can catch the regressions in a public repo so I can publicly share the results (Still can share the result of the work project, but privately). In Penny, things don't seem to have regressed much. Though worth noting in penny I didn't use After running the CIs in the work project which is having the massive regressions, I saved the logs to my machine for some processing and trying to work out what's going wrong, at which point I noticed the Swift 6.0 has logged ~91.5MB of text while Swift 5.10 has only logged ~49MB. The log size has increased by ~87%, which is in line with the time regressions. Checking the logs further, here's a table of swift-related executables being called directly + how many times they're being called in 6.0 vs 5.10 to build the project:
Next I'll try to figure out why exactly there are so many more calls to swift-frontend / clang / swiftc. |
Base command is
See this file for a diff of how many times things have been called in 5.10 vs in 6.0, in Excludes the project's source files where it names the files. From the file:
|
It appears the 2x builds are being done, one with |
This suggests that cross-compilation is being incorrectly assumed, causing a separate build to be run for the purpose of native tool execution. |
Yeah, I was just confirming that this is the case. So even if the base image is ubuntu-22, and the swift image is jammy, it still somehow triggers |
cc @finagolfin @eeckstein found the root issue, for whenever you have time 🙂 ⬆️ |
Sounds like a SwiftPM bug, you should search their issues to see if someone has reported this already and file it yourself if not. If you file, you'll want to narrow it down to a small usecase that demonstrates the problem. Are you using macros? I have seen regressions when cross-compiling packages that have macros with 6.1 or later, swiftlang/swift-package-manager#8078, you may be triggering the same bug in 6.0 somehow. |
cc @MaxDesiatov |
Filed this: swiftlang/swift-package-manager#8275 |
IIUC the issue should've been moved to that repository instead of creating a new one. Closing this one then. |
The issue in the SwiftPM repo is a subset of what this issue was concerned with. |
Feel free to reopen if there are repro steps that show it's not specific to SwiftPM and is actually a compiler frontend bug. |
@MaxDesiatov ah no, I believe the other issues are resolved already, mostly thanks to @finagolfin . So only the SwiftPM issue remains as far as I can tell. |
Description
Absolutely massiveSignificant build regressions on linux, worsened by swift-testingEDIT: Please also read my next comment which includes more info.
Environment
swift:5.10-noble
toswift:6.0-noble
.build
cache. No behavior difference noticed at all.// swift-tools-version:6.0
inPackage.swift
What Happened?
c7x
-family machine with 64 CPU x 128 GB RAM (8x larger than before) runs the tests as well as they were being run before, on swift 5.10.--disable-swift-testing
.significant
toabsolutely massive
.--disable-swift-testing
flag into the mix.Reproduction
Not sure.
Expected behavior
No regressions. Preferably even faster than previous Swift versions.
Environment
Mentioned above.
Additional information
No response
The text was updated successfully, but these errors were encountered: