Skip to content

Make it easier to test load all the .sql files during a docker build operation. #731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zero-below opened this issue May 11, 2020 · 2 comments · Fixed by #1150
Closed

Make it easier to test load all the .sql files during a docker build operation. #731

zero-below opened this issue May 11, 2020 · 2 comments · Fixed by #1150
Labels
Request Request for image modification or feature

Comments

@zero-below
Copy link

zero-below commented May 11, 2020

Issue: Make it easy to test load all files from docker-entrypoint-initdb.d and verify they work, during a docker build, without fully starting the final db instance.

I have a Dockerfile that does FROM postgis/postgis:10-3.0-alpine, which itself has FROM postgres:10-alpine or such. Then we add a bunch of machine generated .sql files into docker-entrypoint-initdb.d, to effectively create a readonly db artifact that can be used in our various environments. Sometimes we find issues where they don't integrate well in our final container we build (ex, one file creates an index on a column that changed in another). But we don't find out until we actually run the container. We actually used to do at the end of our Dockerfile a RUN /docker-entrypoint.sh postgres --describe-config to create that behaviour, but at some point a while back, the adding of the _pg_want_help function caused these options to short-circuit the db load process there (it always exits with success regardless of whether the docker-entrypoint-initdb.d files are valid or not).

In order to catch this at build time, I have created a hacked up dbtest.sh that sources the docker-entrypoint.sh, then runs a limited version of the startup process that basically does everything in _main except the final exec $@. But I'm concerned that this will be hard to keep in sync with the docker-entrypoint.sh if other changes are made in the future.

I feel like this would be best to modify the docker-entrypoint.sh to allow it to stub out that part of the startup into a separate function. Or, add a magic keyword to the docker-entrypoint.sh to activate special init-db-but-don't-launch behavior. It also feels like this would be generally useful for people, and also would maybe give people the option to create containers where the initdb.d stuff is already loaded into tables during the container build process (though that specific feature might be bad, too). In our case, /var/lib/postgresql/data isn't kept in the build, so the full load of the docker-entrypoint-initdb.d will happen on container load, which happens to work well for us.

I can create a PR, but was wondering if there are preferences here as to the preferred paths, first. I think option 3 below seems least disruptive, but also slightly harder to use for some users...

  1. Create a magic keyword that when docker-entrypoint.sh postgres-load-only and it does everything but the exec "$@" at the end of _main.
  2. Create a fake option docker-entrypoint.sh postgres --load-only that shell strips out and then lets the init start and then skips the exec "$@" at the end of _main
  3. Move all of the init stuff into a function called something like docker_init_all that takes almost everything out of _main into a function, and _main would just be effectively docker_init_all $@;exec "$@". Then a user could just source docker-entrypoint.sh and then docker_init_all to clone that functionality. In a dockerfile, it could look like RUN . /docker-entrypoint.sh ; docker_init_all
@wglambert wglambert added the Request Request for image modification or feature label May 12, 2020
@remil1000
Copy link

We had a similar scenario, not about testing the dump at build time but more as to build a reusable Docker image for integration testing.
The dump we want to use takes quite a long time to load from a plain text SQL file and it's not very practical to load it at each initial start of the container - once data is loaded it only takes a second to start

Here is an attempt at implementing this feature, not exactly the same keywords as in the initial request, but the change is quite minimal

diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 749445d..762c7e9 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -222,6 +222,7 @@ docker_setup_env() {
        file_env 'POSTGRES_USER' 'postgres'
        file_env 'POSTGRES_DB' "$POSTGRES_USER"
        file_env 'POSTGRES_INITDB_ARGS'
+       file_env 'POSTGRES_EMBED_INITDB'
        : "${POSTGRES_HOST_AUTH_METHOD:=}"

        declare -g DATABASE_ALREADY_EXISTS
@@ -343,6 +344,10 @@ _main() {
                fi
        fi

+    if [ "$POSTGRES_EMBED_INITDB" ]; then
+        exit 0
+    fi
+
        exec "$@"
 }

and a Dockerfile that can be used to build this image

FROM postgres as initdb

ENV POSTGRES_PASSWORD=secret
ENV POSTGRES_DB=test

COPY testdump.sql /docker-entrypoint-initdb.d/

ENV POSTGRES_EMBED_INITDB=true
RUN /usr/local/bin/docker-entrypoint.sh postgres

FROM postgres as final

ENV POSTGRES_PASSWORD=secret
ENV POSTGRES_DB=test
COPY --from=initdb ${PGDATA} ${PGDATA}

I can issue a Pull Request if necessary but I'm not sure how it should be structured as the docker-entrypoint.sh seems to be copied in a few places and vary for a single line in the Alpine version

derhuerst added a commit to mobidata-bw/ipl-orchestration that referenced this issue Sep 8, 2023
@derhuerst
Copy link

I just stumbled upon this thread while trying to implement the same thing! For lack of better ("better" depends on the priorities of course) alternatives, I want to "bake" the imported/processed SQL right into the Docker image.

I ended up with a slightly modified version of Rémi's approach:

diff --git a/docker-entrypoint.sh.orig b/docker-entrypoint.sh
old mode 100644
new mode 100755
index a383a36..62655a1
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh.new
@@ -343,7 +343,9 @@ _main() {
 		fi
 	fi
 
-	exec "$@"
+	if [ "${POSTGRES_INIT_THEN_EXIT:-}" != '1' ]; then
+		exec "$@"
+	fi
 }
 
 if ! _is_sourced; then

Here is my Dockerfile, stripped ad-hoc from domain-specific code:

FROM ghcr.io/public-transport/gtfs-via-postgres AS sql

WORKDIR /importer

ENV DEST_PATH=/tmp/sql/gtfs.sql
ADD import.sh ./
RUN --mount=type=cache,target=/tmp/gtfs,sharing=locked \
	./import.sh

FROM postgis/postgis:15-3.4-alpine AS import

# configure access to the container-local PostgreSQL server
ARG POSTGRES_USER
ENV POSTGRES_USER=$POSTGRES_USER
ARG POSTGRES_PASSWORD
ENV POSTGRES_PASSWORD=$POSTGRES_PASSWORD
ARG POSTGRES_DB
ENV POSTGRES_DB=$POSTGRES_DB

# patch pre-populated docker-entrypoint.sh
ADD docker-entrypoint.sh /usr/local/bin/
# tell docker-entrypoint.sh *not to* (re)start PostgreSQL after importing /docker-entrypoint-initdb.d/*
ENV POSTGRES_INIT_THEN_EXIT=1

# We prefix our GTFS SQL with `20_`, so that it gets processed *after* the pre-existing PostGIS init script (/docker-entrypoint-initdb.d/10_postgis.sh).
RUN --mount=type=bind,from=sql,source=/tmp/sql,target=/tmp/sql \
	ln -s /tmp/sql/gtfs.sql /docker-entrypoint-initdb.d/20_gtfs.sql && \
	/usr/local/bin/docker-entrypoint.sh postgres && \
	rm /docker-entrypoint-initdb.d/20_gtfs.sql

# For the final image, use an "unadulterated" PostGIS image.
FROM postgis/postgis:15-3.4-alpine

# Copy over the imported DB from the `import` stage.
COPY --from=import /var/lib/postgresql/data /var/lib/postgresql/data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Request Request for image modification or feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants