Skip to content

Native Image Build Bundles #5473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
olpaw opened this issue Nov 21, 2022 · 10 comments · Fixed by #5569 or #6095
Closed

Native Image Build Bundles #5473

olpaw opened this issue Nov 21, 2022 · 10 comments · Fixed by #5569 or #6095

Comments

@olpaw
Copy link
Member

olpaw commented Nov 21, 2022

Draft: Native Image Build Bundles

Motivation

  1. The deployment of a native image is just one step in the lifecycle of an application or service. Real world
    applications run for years and sometimes need to be updated or patched long after deployment (security fixes). It would be great if there would be an easy way to redo an image build at some point in the future as accurately as possible.

  2. Another angle is provided from the development phase. If image building fails or a malfunctioning image is created (i.e. the same application runs fine when executed via JVM) we would like to get bug reports that allow us to reproduce the problem locally without hours of replicating their setup. We would want some way to bundle up what the user built (or tried to build) into a nice package that allows us to instantly reproduce the problem on our side.

  3. Debugging an image created long time ago is also sometimes needed. It would be great if there is a single bundle that contains everything needed to perform this task.

Build Bundles

A set of options should be added to the native-image command that allows to create so-called "build bundles" that can
be used to help with problems described above. There shall be

native-image --bundle-create=mybundle.nib ...other native-image arguments...

This will instruct native-image to create build bundle mybundle.nib alongside the image.

For example, after the running:

native-image --bundle-create=alaunch.nib -Dlaunchermode=2 -EBUILD_ENVVAR=env42 \
 -p somewhere/on/my/drive/app.launcher.jar:/mnt/nfs/server0/mycorp.base.jar \
 -cp $HOME/ourclasses:somewhere/logging.jar:/tmp/other.jar:aux.jar \
 -m app.launcher/paw.AppLauncher alaunch

the user sees the following build results:

~/foo$ ls
alaunch.output alaunch.nib somewhere aux.jar

As we can see, in addition to the image a alaunch.nib-file and the alaunch.output directory were created. This is the native image build bundle for the image that got built and a directory for the actual image that got built and any additional files that got created as part of image building. At any time later, if the same version of GraalVM is used, the image can be rebuilt
with:

native-image --bundle-apply=.../path/to/alaunch.nib

this will rebuild the alaunch image with the same image arguments, environment variables, system properties
settings, classpath and module-path options as in the initial build.

To support the use-case of image-building-as-a-service, there should also be a way to create a bundle without
performing the initial build locally. This allows users to offload image building to a cloud-service specialized in
image building and retaining of build bundles. The command line for that should be:

native-image --bundle-create=mybundle.nib --dry-run ...other native-image arguments...

Build Bundles File Format

A <imagename>.nib file is a regular jar-file that contains all information needed to bundle a previous build.
For example, the alaunch.nib build bundle has the following inner structure:

alaunch.nib
├── input
│   ├── auxiliary <- Contains auxiliary files passed to native-image via arguments
│   │                (e.g. external `config-*.json` files or PGO `*.iprof`-files)
│   ├── classes <- Contains all class-path and module-path entries passed to the builder
│   │   ├── cp
│   │   │   ├── aux.jar
│   │   │   ├── logging.jar
│   │   │   ├── other.jar
│   │   │   └── ourclasses
│   │   └── p
│   │       ├── app.launcher.jar
│   │       └── mycorp.base.jar
│   └── stage
│       ├── all.env <- All environment variables used in the image build
│       ├── all.properties  <- All system properties passed to the builder
│       ├── build.cmd <- Full native-image command line (minus --bundle-create option)
│       ├── run.cmd <- Arguments to run application on java (for laucher, see below) 
│       └── container <- For containerized build this subdirs holds all info about that 
│           ├── Dockerfile <- Container image that was used to perform the build
│           ├── run.cmd <- Arguments passed to docker/podman to run the container
│           └── setup.json <- Info about the docker/podman setup that was used
│                             * Linux kernel version
│                             * Docker/podman version
│                             * CGroup v2 or CGroup v1
├── output
│   ├── debug
│   │   ├── alaunch.debug <- Native debuginfo for the built image.
│   │   └── sources <- Reachable sources needed for native debugging.
│   └── build
│       └── report <- Contains information about the build process.
│           │         When rebuilding, these will be compared against. 
│           ├── analysis_results.json
│           ├── build_artifacts.json
│           ├── build.log
│           ├── build_output.json
│           ├── jni_access_details.json
│           └── reflection_details.json
├── META-INF
│   ├── MANIFEST.MF <- Specifes nibundle/Launcher as mainclass
│   └── nibundle.properties <- Contains build bundle version info:
│                     * build bundle format version
│                     * Platform the bundle was created on (e.g. linux-amd64) 
│                     * GraalVM / Native-image version used for build
└── nibundle
    └── Launcher.class <- Launcher for running of application with `java`
                          (uses files from input directory)

As we can see, there are several components in a build bundle that we need to describe in more detail.

META-INF:

Since the bundle is also a regular jar-file we have a META-INF subdirectory with the familiar MANIFEST.MF. The
bundle can be used like a regular jar-launcher (by running command java -jar <imagename>.nib) so that the
application we build an image from is instead executed on the JVM. For that purpose the MANIFEST.MF specifies the
nibundle/Launcher as main class. Is is particularly useful if you want to run the application on the JVM with the native-image agent to collect configuration data that you then integrate into the bundle as a second step.

Here we also find nibundle.properties. This file is specific to build bundles. Its existence makes clear that this is no
ordinary jar-file but a native image build bundle. The file contains version information of the native image build
bundle format itself and also which GraalVM version was used to create the bundle. This can later be used to report a
warning message if a bundle gets built with a GraalVM version different from the one used to create the bundle.
This file also contains information about the platform the bundle was created on (e.g. linux-amd64 or
darwin-aarch64).

input:

This directory contains the entire amount of information needed to redo the previous image build. The original
class-path and module-path entries are placed into corresponding files (for jar-files) and subdirectories (for
directory-based class/module-path entries) into the input/classes/cp (original -cp/--class-path entries) and the
input/classes/p (original -p/--module-path entries) folders. The input/stage folder contains all information
needed to replicate the previous build context.

input/stage:

Here we have build.cmd that contains all native-image command line options used in the previous build. Note that
even the initial build that created the bundle already uses a class- and/or module-path that refers to the contents
of the input/classes folder
. This way we can guarantee that a bundle build sees exactly the same relocated
class/module-path entries as the initial build. The use of run.cmd is explained later.

File all.env contains the environment variables that we allowed the builder to see during the initial build and
all.properties the respective system-properties.

input/stage/container:

If the image builder runs in a container environment, this subdirectory holds all information necessary to redo the
image build later in an equivalent container environment. It contains the Dockerfile that was used to specify the
container image that executed the image builder. Next, run.cmd contains all the arguments that were passed to
docker/podman. It does not contain the arguments passed to be builder. In setup.json we save all information about
the container environment that was used (Linux kernel version, CGroup V1 or V2, Docker/podman version). For more info
see below.

output:

This folder contains all the output that was generated by the image build process (if the image was built as part of bundle creation). This contains debuginfo needed in case we need to debug the image at some point in the future.

output/build:

This folder is used to document the build process that lead to the image that was created alongside the bundle.
The report sub-folder holds build.log. It is equivalent to what would have been created if the user had appended
|& tee build.log to the original native-image command line. Additionally, we have several json-files:

  • analysis_results.json: Contains the results of the static analysis. A rerun should compare the new
    analysis_results.json file with this one and report deviations in a user-friendly way.
  • build_artifacts.json: Contains a list of the artifacts that got created during the initial build. As before,
    changes should be reported to the user.
  • build_output.json: Similar information as build.log but more structured and detailed.
  • jni_access_details.json: Overview which methods/classes/fields have been made jni-accessible for image-runtime.
  • reflection_details.json: Same kind of information for reflection access at image runtime.

As already mentioned a rebuild should compare its newly generated set of json-files against the one in the bundle and
report deviations from the original ones in a user-friendly way.

nibundle:

Contains the Launcher.class that is used when the bundle is run as a regular java launcher. The class-file is not
specific to a particular bundle. Instead, the Launcher class extracts the contents of the input into a temporary
subdirectory in $TEMP and uses the files from input/stage/all.* and input/stage/run.cmd to invoke
$JAVA_HOME/bin/java with the environment-variables and with the arguments (e.g. system-properties) needed to run the
application on the JVM.

Enforced sanitized image building

Containerized image building on supported platforms

If available, docker/podman should be used to run the image builder inside a well-defined container image. This allows
us to prevent the builder from using the network during image build
, thus guaranteeing that the image build result did
not depend on some unknown (and therefore unreproducible) network state. Another advantage is that we can mount
input/classes and $GRAALVM_HOME read-only into the container and only allow read-write access to the mounted out
and build directories. This will prevent the application code that runs at image build time to mess with anything
other than those directories. All information about containerized building is recorded in bundle subdirectory
input/stage/container.

Fallback for systems without container support

If containerized builder execution is not possible we can still at least have the builder run in a sanitized
environment variable state
and make sure that only environment variables are visible that were explicitly
specified with -E<env_var_name>=<env_var_value> or -E<env_var_name>
(to allow passing through from the
surrounding environment).

Handling of Image build errors

To ensure build bundles are feasible for the second use case described above we have to make sure a
bundle gets successfully created even if the image build fails. Most likely in this case the out folder will be
missing in the bundle. But as usual build/report/build.log will contain all the command line output that was shown
during the image build. This also includes any error messages that resulted in the build failure.

@olpaw
Copy link
Member Author

olpaw commented Nov 21, 2022

This document started out on #5460 and is now further maintained as roadmap item here.

graalvmbot pushed a commit that referenced this issue Nov 21, 2022
@olpaw
Copy link
Member Author

olpaw commented Nov 21, 2022

TODO: Add more info about input/auxiliary

graalvmbot pushed a commit that referenced this issue Nov 24, 2022
@olpaw
Copy link
Member Author

olpaw commented Nov 24, 2022

TODO: @jerboaa mentioned (see #5460 (comment)) that build should also contain info about the static libs that got linked into the image.
We can add such information to build_output.json.

@olpaw
Copy link
Member Author

olpaw commented Nov 28, 2022

TODO: Add info about input/stage/path_substitutions.json

@pejovica
Copy link
Member

Just a thought. Instead of introducing native-image --replay to rebuild the image from the replay bundle, we could perhaps reuse the existing -jar option, so the above example for rebuilding the alaunch image would become:

native-image -jar .../path/to/alaunch.replay.jar

I find it kind of natural since the replay bundle is also a regular jar file.

@olpaw
Copy link
Member Author

olpaw commented Dec 13, 2022

Since it turns out there might be more use-cases for these bundles than just replay we renamed them to Native Image Build Bundles and use extension .nib .

@sgammon
Copy link

sgammon commented Dec 20, 2022

@olpaw i encountered this issue searching for re-usable build outputs from native-image. is there a way today to re-use previously built outputs for a shared library, assuming architectures match?

for instance, could I ship an artifact with a library I make, which reduces the compile time impact on downstream apps that compile with GraalVM?

if that doesn't exist today, would this feature help get closer to a reality where it does?

@olpaw
Copy link
Member Author

olpaw commented Dec 21, 2022

for instance, could I ship an artifact with a library I make, which reduces the compile time impact on downstream apps that compile with GraalVM?

@sgammon, the only way to speed up image building we currently have is to use the quick build mode for development (-Ob). Currently we have no way of reusing information from a previous build to speed up rebuilding in the future.

The focus of this feature is not to reduce the build time of an image upon rebuilding.

Build bundles instead make it possible to rebuild an image later without keeping all the files around that were involved in building that image. That includes all the jar-files and directories that were passed via class-path and/or module-path, configuration files passed via builder options, used @-files ... you get the idea.

@sgammon
Copy link

sgammon commented Dec 21, 2022

@olpaw totally get the idea, sorry, let me clarify. if we had such bundles, wouldn't we be closer to being able to discern whether a native dependency is or is not compatible? and wouldn't we have a starting point for a format to pre-package native dependencies? and if such a format existed, and the compiler could load and use it, couldn't it be a way to reduce build times, later?

i understand this issue itself is unrelated, but i'm also wondering if this line of work gets GraalVM closer to that eventuality. thank you, by the way, for building quickbuild mode, and i'm not even upset at GraalVM build times, but it would be very interesting if GraalVM-compatible artifacts could avoid rebuilding native sections of code. such artifacts already prepare to ship reflection metadata, for example, and if library authors are waiting for a native build to complete, it could, in theory, be reused later (modulo architecture and OS differences, obviously).

sophie-kaleba pushed a commit to sophie-kaleba/truffle that referenced this issue Jan 16, 2023
@olpaw olpaw linked a pull request Feb 1, 2023 that will close this issue
@olpaw
Copy link
Member Author

olpaw commented May 16, 2023

Bundles are on master and are part of 23.0. The bundle documentation is part of official documentation.
See https://github.com/oracle/graal/blob/master/docs/reference-manual/native-image/Bundles.md

@olpaw olpaw closed this as completed May 16, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in GraalVM Community Roadmap May 16, 2023
@fniephaus fniephaus moved this from Done to Released in GraalVM Community Roadmap Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment