Skip to content

Excessive startup time with large number of databases #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shimaore opened this issue Jan 8, 2020 · 5 comments
Closed

Excessive startup time with large number of databases #162

shimaore opened this issue Jan 8, 2020 · 5 comments

Comments

@shimaore
Copy link

shimaore commented Jan 8, 2020

Expected Behavior

Startup should take a few seconds, not a few minutes.

Current Behavior

Currently, the find command in docker-entrypoint.sh might run for a few minutes if there is a large number (e.g. thousands) of databases in a cluster.

Possible Solution

Run the find command in the background instead of the foreground.

Steps to Reproduce (for bugs)

I guess this is fairly obvious, but you could reproduce the issue with

  1. Start a new CouchDB instance with no databases — startup is fast
  2. Create a thousand databases
  3. Restart the instance — startup is slower

Context

Running a large number of databases in production, CouchDB takes a long time to restart after a crash.

Your Environment

  • Version used: 2.3.1
  • Browser Name and version: n/a
  • Operating System and version (desktop or mobile): n/a
  • Link to your project: n/a
@kocolosk
Copy link
Member

kocolosk commented Jan 8, 2020

Are you finding that the startup time is slow even when the permissions on those databases are already correct? We've optimized the startup time in the past by switching to a conditional fix of the permissions, see #109 for some of the discussion there.

If so, I suppose we could push the find command into the background. A user with misconfigured permissions would get some weird errors for the first few seconds, but if it's nothing more serious than that I would be supportive of a change.

@shimaore
Copy link
Author

shimaore commented Jan 8, 2020

I'm probably being a bit shy by saying "one thousand" (find /opt/couchdb/data | wc -l returns 489481). On the other hand, I just tested (with a "hot" running container rather than a "cold" one at startup), and each individual (conditional) find runs in ~10s real time when I run then on the CLI (inside docker), so the total shouldn't amount to what I'm seeing when the container start.

I'll force a restart tonight to try to better understand where the limitation might be.

@wohali
Copy link
Member

wohali commented Jan 8, 2020

Hi @shimaore ,

Previously we've had issues with people using the wrong storage driver for Docker leading to issues.

Do you know which storage driver you're using? If aufs or devicemapper you may want to upgrade to overlay2. (Note of course you'll have to transfer your data to the new backing storage driver accordingly.)

@shimaore
Copy link
Author

shimaore commented Jan 8, 2020

Hi @wohali ,

Sorry, I should have mentioned it. This is a local drive mount and the container uses the overlay2 driver. I'm getting roughly the same performance for the find whether I run it inside the container or directly on the machine running Docker.

In my test tonight I wasn't able to reproduce the issue to the same extent. The system spent some time (about 30s) on running

chown -f couchdb:couchdb /opt/couchdb /opt/couchdb/bin /opt/couchdb/bin/couchdb /opt/couchdb/bin/remsh /opt/couchdb/bin/couchup /opt/couchdb/bin/couchjs /opt/couchdb/bin/couchdb.cmd /opt/couchdb/erts-8.3.5 /opt/couchdb/erts-8.3.5/include /opt/couchdb/erts-8.3.5/include/erl_driver.h /opt/couchdb/erts-8.3.5/include/internal /opt/couchdb/erts-8.3.5/include/internal/ethr_atomics.h /opt/couchdb/erts-8.3.5/include/internal/ethr_mutex.h /opt/couchdb/erts-8.3.5/include/internal/ethread.mk /opt/couchdb/erts-8.3.5/include/internal/ethread.h /opt/couchdb/erts-8.3.5/include/internal/gcc /opt/couchdb/erts-8.3.5/include/internal/gcc/ethread.h /opt/couchdb/erts-8.3.5/include/internal/gcc/ethr_atomic.h /opt/couchdb/erts-8.3.5/include/internal/gcc/ethr_dw_atomic.h /opt/couchdb/erts-8.3.5/include/internal/gcc/ethr_membar.h /opt/couchdb/erts-8.3.5/include/internal/ppc32 /opt/couchdb/erts-8.3.5/include/internal/ppc32/ethread.h /opt/couchdb/erts-8.3.5/include/internal/ppc32/rwlock.h /opt/couchdb/erts-8.3.5/include/internal/ppc32/ethr_membar.h /opt/couchdb/erts-8.3.5/include/internal/ppc32/spinlock.h /opt/couchdb/erts-8.3.5/include/internal/ppc32/atomic.h /opt/couchdb/erts-8.3.5/include/internal/i386 /opt/couchdb/erts-8.3.5/include/internal/i386/ethread.h /opt/couchdb/erts-8.3.5/include/internal/i386/rwlock.h /opt/couchdb/erts-8.3.5/include/internal/i386/ethr_dw_atomic.h /opt/couchdb/erts-8.3.5/include/internal/i386/ethr_membar.h /opt/couchdb/erts-8.3.5/include/internal/i386/spinlock.h /opt/couchdb/erts-8.3.5/include/internal/i386/atomic.h /opt/couchdb/erts-8.3.5/include/internal/ethread_header_config.h /opt/couchdb/erts-8.3.5/include/internal/win /opt/couchdb/erts-8.3.5/include/internal/win/ethread.h /opt/couchdb/erts-8.3.5/include/internal/win/ethr_event.h /opt/couchdb/erts-8.3.5/include/internal/win/ethr_atomic.h /opt/couchdb/erts-8.3.5/include/internal/win/ethr_dw_atomic.h /opt/couchdb/erts-8.3.5/include/internal/win/ethr_membar.h /opt/couchdb/erts-8.3.5/include/internal/README /opt/couchdb/erts-8.3.5/include/

This brings two remarks:

  • This is surprising because the Dockerfile should take care of most of these.
  • Maybe more importantly, when I run the same chown -f command manually (using docker exec …) once the container has finished starting and is running CouchDB, it doesn't lag for 30s (it clocks at less than 10ms).

So there's definitely some slowness that is due to the fact that Docker is starting the container (maybe caching-related or whatsnot). In that case it might be better to delay the find than simply put it in the background.

@shimaore shimaore closed this as completed Jan 8, 2020
@shimaore shimaore reopened this Jan 8, 2020
@shimaore
Copy link
Author

shimaore commented Jan 8, 2020

The slowness is (based on iostat) due to a large set of disk writes during the creation of the new container; the chown operation completes at the time the disk writes complete but is probably not the root cause. This doesn't seem either to be related (in an obvious manner) to the large number of databases as I claimed.

Since this might be specific to my environment, I might be better off at this point building a custom container based on the official one with a modified entrypoint, and do more detailed performance analysis.

I'll close for now and open a new issue if I figure something out. Thank you for your time @kocolosk and @wohali.

@shimaore shimaore closed this as completed Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants