-
Notifications
You must be signed in to change notification settings - Fork 143
Description
Summary
ASP.NET Core applications using Hangfire with PostgreSQL crash fatally during startup if the database is temporarily unreachable, preventing the application from starting even when the API itself could function independently, in let say a degraded state.
Environment
- Hangfire Version: 1.8.22
- Storage: Hangfire.PostgreSql 1.20.12
- Framework: ASP.NET Core .NET8
- Database: 18.0 - PostgreSQL Npgsql.EntityFrameworkCore.PostgreSQL 9.0.4
Current Behavior
Both AddHangfire() (during service registration) and MapHangfireDashboard() (during middleware configuration) attempt immediate database connections without retry logic. If PostgreSQL is unreachable during startup:
services.AddHangfire((provider, config) =>
{
...
config.UsePostgreSqlStorage(c =>
{
c.UseNpgsqlConnection(connectionString);
})
...
});
- Application throws
Npgsql.NpgsqlException: Failed to connect to... - Application startup fails completely
- No graceful degradation or retry mechanism
Expected Behavior
Hangfire should provide startup resilience options:
- Retry with exponential backoff during initialization
- Graceful degradation: allow app to start with Hangfire unavailable, let the developer decide what to do.
- Once the DB is available, Hangfire initialize and the dashboard is functional.
Real-World Scenario
I am aware that in a containerized environment, container will just keep restarting until the DB is available, also if the DB is down/unreachable it's for sure is a critical situation for the app functioning in general, but I also believe there are scenarios where we need to ensure the app starts gracefully even if the DB is down. I think we can do better.
Attempted Workarounds
I tried an approach to defer the hangfire registration/configuration with an ASP.NET Core BackgroundService with some try catch flow, but that won't work with MapHangfireDashboard() since it maps an endpoint.
I don't think using PrepareSchemaIfNecessary= false is the solution for this, since it introduces risk of unmatching schema and manual schema migration management.
I cannot find a solution to this issue from my perspective. IMHO startup resilience should be a builtin first-class feature rather than requiring custom workarounds. Thoughts?
ps: Coming from HangfireIO/Hangfire#2560