Include database status on the /health endpoint or a new endpoint like /db-health #6588

sunshineo · 2020-04-09T19:38:24Z

Is your feature request related to a problem? Please describe.
Deployed Parse server using k8 and used /parse/health for the readinessProbe and livenessProbe probe. But when deploy new code and broke the database connection, /parse/health still returns 200 OK. So old container was killed and new broken one brought online

Describe the solution you'd like
Only return 200 ok after db is connected on /parse/health or /parse/db-health

Describe alternatives you've considered
Tried to create a cloud function that checks db connection but that requires POST and have application-id in the header, which k8 cannot do

Additional context
Related to #4575

davimacedo · 2020-04-14T06:39:28Z

It sounds reasonable for me. Are willed to tackle this one?

mtrezza · 2020-04-14T17:29:18Z

I think it's a good PR but this is definitely a breaking change and should be denoted clearly in the release notes.

Currently, calling /parse/health is an endpoint that could be in use to tell whether parse server is reachable via a given domain, for connection string failover. So the information derived from the endpoint response is slightly different when it also depends on the DB connection string health.

sunshineo · 2020-04-15T00:05:59Z

New info I learned: for K8, there is a liveness probe and a readiness probe. The liveness probe should be current /parse/health . It only means if the docker is responsible or not. And if not, K8 will restart the docker. But a readiness probe should include if there is a proper database connection because if that is true, K8 will send traffic to it.

So we should have a separated url like /parse/db-health

mtrezza · 2020-04-15T01:30:04Z

Good point. Is there a scenario in which the DB is not ready immediately and will become ready after some time? Does parse server support a DB connection that is not ready on start-up?

What do you think about adding a general /parse/ready endpoint? I am thinking about a cache system or other modules that may need to become ready for parse server to function. So all these modules could be summarized in a single ready endpoint, and it's easily extendable.

sunshineo · 2020-04-15T05:54:14Z

Thank you @mtrezza . I'll try to create a ready endpoint starting with just db and later we can add more. I plan to just do a query on the _User collection with count=0. I think it will work but let me know if there is a better way

mtrezza · 2020-04-15T11:50:46Z

It depends on how you define “ready” for the DB, which has itself a status endpoint.

For example:

The driver has a isConnected method to determine whether a connection between driver and DB is established.

https://mongodb.github.io/node-mongodb-native/3.5/api/MongoClient.html#isConnected

The server status, which may be a good middle ground for your ready endpoint.

https://mongodb.github.io/node-mongodb-native/3.5/api/Admin.html#serverStatus

The replica set status, where you determine the individual node status. Just because you can execute a query, doesn’t mean the replica set is fully up and running and ready for what you consider your modus operandi. Maybe a separate PR in the future, if someone wants to get into these details.

https://mongodb.github.io/node-mongodb-native/3.5/api/Admin.html#replSetGetStatus

_{Sent with GitHawk}

davimacedo · 2020-04-15T18:50:16Z

Thank you @mtrezza . I'll try to create a ready endpoint starting with just db and later we can add more. I plan to just do a query on the _User collection with count=0. I think it will work but let me know if there is a better way

I'd do a query in the schema table.

mtrezza · 2020-04-16T16:18:07Z

I did some research and came across two concepts for these endpoints. Maybe it can give you some inspiration.

Accumulative endpoints

/health/live - The application is up and running.
/health/ready - The application is ready to serve requests.
/health - Accumulating all health check procedures in the application.

https://quarkus.io/guides/microprofile-health

Separate endpoints

GET /-/health
Only checks whether the application server is running It does not verify the database or other services are running. A successful response will return a 200 status code

GET /-/readiness
Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either.

GET /-/liveness
Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted.

GET /-/metrics
Node process metrics
Infrastructure to collect internal metrics (like southbound APIs)
API to retrieve the collected metrics and to reset them.

https://www.npmjs.com/package/@ozawa/express-health-check-middleware

davimacedo · 2020-04-21T05:12:24Z

I believe option 1 is good enough for now but option 2 would be really great. I have no idea how we could ensure all those states in option 2 though.

mtrezza · 2020-04-26T18:56:48Z

I think we should decide for a long-term option now, because if we want to change this in a few months, we may have a breaking change and things get more complex.

Option 1 takes /health as summary endpoint (everything is running and ready).
Option 2 takes /health as a single-aspect endpoint (parse server is running, but maybe not ready).

What is more common in 3rd party systems to ensure compatibility and practicability for this endpoint?

mman · 2020-04-28T09:37:17Z

Looking around my k8s cluster, this is what prometheus uses:

    Liveness:     http-get http://:web/-/healthy delay=0s timeout=3s period=5s #success=1 #failure=6
    Readiness:    http-get http://:web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=120

and this is what ambassador uses:

    Liveness:   http-get http://:8877/ambassador/v0/check_alive delay=30s timeout=1s period=3s #success=1 #failure=3
    Readiness:  http-get http://:8877/ambassador/v0/check_ready delay=30s timeout=1s period=3s #success=1 #failure=3

So IMHO focusing on ready, vs. live, where live probe avoids accidental restarts and ready probe serves as a signal to start routing traffic sounds better then to focus on overall health vs db health.

This is a great resource where I think we should aim to go: https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-setting-up-health-checks-with-readiness-and-liveness-probes

mtrezza · 2020-04-28T12:27:51Z

@mman Great research, makes absolutely sense. If I understand it correctly, it’s option 2, right?

_{Sent with GitHawk}

mman · 2020-04-28T13:00:07Z

Yes, option 2, but also potentially changing names of the endpoints to better indicate their purpose. /healthy + /ready as prometheus uses sounds logical to me, perhaps keeping the legacy /health around pointing towards /healthy as well for compatibility reasons.

sunshineo mentioned this issue Apr 9, 2020

Remove X-Parse-Application-Id header #6590

Closed

davimacedo added type:feature New feature or improvement of existing feature good first issue labels Apr 14, 2020

mtrezza removed the good first issue label Nov 1, 2020

mtrezza removed the up for grabs label Jul 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Include database status on the /health endpoint or a new endpoint like /db-health #6588

Include database status on the /health endpoint or a new endpoint like /db-health #6588

sunshineo commented Apr 9, 2020

davimacedo commented Apr 14, 2020

Uh oh!

mtrezza commented Apr 14, 2020 •

edited

Loading

Uh oh!

sunshineo commented Apr 15, 2020

Uh oh!

mtrezza commented Apr 15, 2020 •

edited

Loading

Uh oh!

sunshineo commented Apr 15, 2020

Uh oh!

mtrezza commented Apr 15, 2020

Uh oh!

davimacedo commented Apr 15, 2020

Uh oh!

mtrezza commented Apr 16, 2020

Uh oh!

davimacedo commented Apr 21, 2020

Uh oh!

mtrezza commented Apr 26, 2020 •

edited

Loading

Uh oh!

mman commented Apr 28, 2020

Uh oh!

mtrezza commented Apr 28, 2020 •

edited

Loading

Uh oh!

mman commented Apr 28, 2020

Uh oh!

Uh oh!

Include database status on the /health endpoint or a new endpoint like /db-health #6588

Include database status on the /health endpoint or a new endpoint like /db-health #6588

Comments

sunshineo commented Apr 9, 2020

davimacedo commented Apr 14, 2020

Uh oh!

mtrezza commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunshineo commented Apr 15, 2020

Uh oh!

mtrezza commented Apr 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunshineo commented Apr 15, 2020

Uh oh!

mtrezza commented Apr 15, 2020

Uh oh!

davimacedo commented Apr 15, 2020

Uh oh!

mtrezza commented Apr 16, 2020

Uh oh!

davimacedo commented Apr 21, 2020

Uh oh!

mtrezza commented Apr 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mman commented Apr 28, 2020

Uh oh!

mtrezza commented Apr 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mman commented Apr 28, 2020

Uh oh!

mtrezza commented Apr 14, 2020 •

edited

Loading

mtrezza commented Apr 15, 2020 •

edited

Loading

mtrezza commented Apr 26, 2020 •

edited

Loading

mtrezza commented Apr 28, 2020 •

edited

Loading