Skip to content

docker-compose run changes permissions on mounted PGDATA volume, prevents postgres startup #346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
groves opened this issue Sep 24, 2017 · 18 comments
Labels

Comments

@groves
Copy link

groves commented Sep 24, 2017

I'm using a docker-compose.yml like this in Docker version 17.06.2-ce-mac27 (19124):

version: '3'

services:
  db:
    image: postgres:9.6.4
    volumes:
      - db:/var/lib/postgresql/data

volumes:
  db:

If I docker-compose run on the db service from that yaml, it changes the permissions on the mount in the main db service container to 777:

~/d/composetest> docker-compose up -d
Creating network "composetest_default" with the default driver
Creating volume "composetest_db" with default driver
Pulling db (postgres:9.6.4)...
9.6.4: Pulling from library/postgres
Digest: sha256:586320aba4a40f7c4ffdb69534f93c844f01c0ff1211c4b9d9f05a8bddca186f
Status: Downloaded newer image for postgres:9.6.4
Creating composetest_db_1 ... 
Creating composetest_db_1 ... done
~/d/composetest> docker-compose exec db ls -l /var/lib/postgresql/
total 4
drwx------ 19 postgres postgres 4096 Sep 24 23:03 data
~/d/composetest> docker-compose run db ls -l /var/lib/postgresql/
total 4
drwxrwxrwx 19 postgres postgres 4096 Sep 24 23:03 data
~/d/composetest> docker-compose exec db ls -l /var/lib/postgresql/
total 4
drwxrwxrwx 19 postgres postgres 4096 Sep 24 23:03 data

This tripped me up because postgres checks the permissions on its data directory at startup, and if it isn't 700, it refuses to start up. I wrote a script to docker-compose up my db service and then docker-compose run a script in the db service that checks if postgres is running using psql. Since I do a docker-compose run immediately, docker-entrypoint.sh is still doing its multiple rounds of postgres startup in the main container. postgres does a permissions check after the initial docker-compose run, and fails to startup due to the perm change. I was able to work around it by using docker-compose exec instead of docker-compose run, but it took me a lot of puzzling around.

I think this is happening because the Dockerfile sets the permissions to 777 at https://github.com/docker-library/postgres/blob/master/Dockerfile-debian.template#L126 but the entrypoint only sets them back to 700 https://github.com/docker-library/postgres/blob/master/docker-entrypoint.sh#L34. Not sure if there's a way to keep from making that directory 777, but it would've saved me a fair bit of confusion.

@yosifkit
Copy link
Member

yosifkit commented Sep 26, 2017

The 777 permissions were for #253, but it is unrelated to your issue.

From what I understand, your issue is that the postgres image starts initially as a localhost only daemon for initialization. So it is not available to anything outside localhost. Then once initialization is complete, it is stopped and restarted to listen on all interfaces. This is working as designed and allows you to know when initialization is complete: it responds to an external connection.

Here are the snippets that start the localhost only daemon, stop it, and start the new daemon:

pg_ctl -D "$PGDATA" \
-o "-c listen_addresses='localhost'" \
-w start

pg_ctl -D "$PGDATA" -m fast -w stop
echo
echo 'PostgreSQL init process complete; ready for start up.'
echo
fi
fi
exec "$@"

(the final exec "$@" will replace the bash startup process with the remote-available postgres)

The localhost-only init server will only happen when the database files are empty. So after first start (if the named volume is not deleted), it should start directly to the regular postgres server. (see also #203 (comment))

@groves
Copy link
Author

groves commented Sep 26, 2017

Yeah, my issue is that if I docker-compose run before the final exec in the entrypoint, the regular postgres server fails to start. Would it be possible to only 777 the PGDATA directory in the Dockerfile if it doesn't already exist?

That would be changing

RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA" # this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)

to something like

RUN set -ex; \
         if [ ! -d "$PGDATA" ] ; then \
             mkdir -p "$PGDATA"; \
             chown -R postgres:postgres "$PGDATA"; \
# this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
             chmod 777 "$PGDATA";  \
         fi

Not sure if that would screw up the arbitrary --user support, but it seems like it'd keep it from clobbering the permissions if they've been set elsewhere.

@tianon
Copy link
Member

tianon commented Sep 26, 2017 via email

@groves
Copy link
Author

groves commented Sep 26, 2017

Doh, forget that suggestion. docker-compose run changing the perms confused me.

@tianon
Copy link
Member

tianon commented Oct 11, 2017

So do we have a simple reproducer for the issue here, or was @yosifkit's explanation sufficient? 😄

@groves
Copy link
Author

groves commented Oct 12, 2017

The repro is use the compose file above, run docker-compose up -d && docker-compose run db ls, and postgres fails to start. I can do the following locally and it fails every time:

~/d/composetest> docker-compose down --remove-orphans -v --rmi all ; docker-compose up -d ; docker-compose run db ls ; docker-compose logs | tail -5
Removing composetest_db_run_1 ... done
Removing composetest_db_1     ... done
Removing network composetest_default
Removing volume composetest_db
Removing image postgres:9.6.4
Creating network "composetest_default" with the default driver
Creating volume "composetest_db" with default driver
Pulling db (postgres:9.6.4)...
9.6.4: Pulling from library/postgres
Digest: sha256:586320aba4a40f7c4ffdb69534f93c844f01c0ff1211c4b9d9f05a8bddca186f
Status: Downloaded newer image for postgres:9.6.4
Creating composetest_db_1 ... 
Creating composetest_db_1 ... done
bin   docker-entrypoint-initdb.d  home	 media	proc  sbin  tmp
boot  docker-entrypoint.sh	  lib	 mnt	root  srv   usr
dev   etc			  lib64  opt	run   sys   var
db_1  | waiting for server to start....FATAL:  data directory "/var/lib/postgresql/data" has group or world access
db_1  | DETAIL:  Permissions should be u=rwx (0700).
db_1  |  stopped waiting
db_1  | pg_ctl: could not start server
db_1  | Examine the log output.

It mattered to me as I was trying to use docker-compose run to check that postgres was up before continuing, and it took me a while to figure out that run was keeping it from starting up.

@tianon
Copy link
Member

tianon commented Oct 12, 2017

Running docker-compose run db ls will run a new container, and in that new container, instead of invoking PostgreSQL, it'll invoke ls. What you want to check is docker-compose logs db (as you've done), or docker-compose ps. The absolute best way to verify that PostgreSQL is fully operational and ready for connections (finished with initialization) is to try connecting to it (either via docker-compose exec and using psql or a second container connecting to the database).

That being said, I cannot reproduce a failure to start with the exact docker-compose.yml provided, so I am at a loss: 😕

$ cat docker-compose.yml
version: '3'

services:
  db:
    image: postgres:9.6.4
    volumes:
      - db:/var/lib/postgresql/data

volumes:
  db:

$ docker-compose up -d
Creating network "tmp_default" with the default driver
Creating volume "tmp_db" with default driver
Pulling db (postgres:9.6.4)...
9.6.4: Pulling from library/postgres
Digest: sha256:586320aba4a40f7c4ffdb69534f93c844f01c0ff1211c4b9d9f05a8bddca186f
Status: Downloaded newer image for postgres:9.6.4
Creating tmp_db_1 ... 
Creating tmp_db_1 ... done

$ docker-compose logs --tail=10
Attaching to tmp_db_1
db_1  | LOG:  database system is shut down
db_1  |  done
db_1  | server stopped
db_1  | 
db_1  | PostgreSQL init process complete; ready for start up.
db_1  | 
db_1  | LOG:  database system was shut down at 2017-10-12 17:50:18 UTC
db_1  | LOG:  MultiXact member wraparound protections are now enabled
db_1  | LOG:  autovacuum launcher started
db_1  | LOG:  database system is ready to accept connections

$ docker-compose run db ls
bin   docker-entrypoint-initdb.d  home	 media	proc  sbin  tmp
boot  docker-entrypoint.sh	  lib	 mnt	root  srv   usr
dev   etc			  lib64  opt	run   sys   var

$ docker-compose ps
  Name                Command              State    Ports   
-----------------------------------------------------------
tmp_db_1   docker-entrypoint.sh postgres   Up      5432/tcp 

@groves
Copy link
Author

groves commented Oct 14, 2017

Sorry, ls is just an easy example showing that docker-compose run from any new container using that image and that mount will break startup. What caused me to encounter this was exactly what you suggested, running psql from a second container to check that postgres was up.

How long are you waiting between docker-compose up and docker-compose run? Like @yosifkit pointed out in his initial response, the perm change has to happen between starting the localhost-only daemon and the remote-available postgres. Since I start checking availability immediately after up, I hit it reliably. docker-compose up -d && docker-compose run db ls does as well since it runs immediately after upping.

@giorgiosironi
Copy link

I can reproduce docker-compose run of an already running container changing the permissions of the volume to 777. Perhaps this is related to the volume being mounted in two different containers (same image but different process). I am working around this by using docker-compose exec instead.

@giorgiosironi
Copy link

I cannot reproduce this with a different image (bash), which suggests it may have to do with the postgres image instead.

@giorgiosironi
Copy link

If I add command: sleep 3600 to the original postgres service, it appears the directory immediately has 777 permissions. That may mean there is something in the image that brings the volume from 777 to 700, and it is not executed when using another docker-compose run.

@giorgiosironi
Copy link

Right, the default 777 likely comes from:

RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA" # this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)

Then at runtime it is set to 700:
chmod 700 "$PGDATA"

but this isn't executed when docker-compose run substitutes it with a custom command, so I guess the volume remains as 777.

@cmardonespino
Copy link

cmardonespino commented Apr 10, 2018

@giorgiosironi I tried with docker-compose run db, but this still create the folder of postgres with permission... Some clue? Im a novice :'(

@giorgiosironi
Copy link

AFAIK docker-compose run db (where db is the service name of the postgres image) should correctly change the permissions of the folder to 700 as it's executing the code mentioned earlier. May be related to some state, so I'll start from a docker-compose down -v to delete all volume data.

@cmardonespino
Copy link

I did that that, but the folder still with permission :'(

@alexandernst
Copy link

hit the same issue

@tianon
Copy link
Member

tianon commented Jul 23, 2018

I think I understand what's happening here. I think there's a race to initialize the same volume on Docker's side -- when you do docker-compose run, it's running a new container attached to the same volume, and Docker is likely applying some logic to determine whether it should adjust the permissions of the volume it provides to match what exists in the image.

I would recommend running your secondary container without the volume (which you can't do directly via docker-compose run xxx that I can see -- it'd have to be a separate entry or even just a direct docker run xxx). You probably don't want your second container to have direct access to the PostgreSQL files anyhow (since that's going to be a stability issue if anything accidentally changes in your second container, like say, the permissions).

@tianon
Copy link
Member

tianon commented Oct 2, 2018

Closing, since this appears to be a (resolved) issue with usage, not an issue we could actually fix in the image. See my comment above (#346 (comment)) for the suggested workaround.

@tianon tianon closed this as completed Oct 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants