Skip to content

Application crashes on startup when the job storage (PostgreSQL) is unreachable - Resilient Initialization #414

@Cfun1

Description

@Cfun1

Summary

ASP.NET Core applications using Hangfire with PostgreSQL crash fatally during startup if the database is temporarily unreachable, preventing the application from starting even when the API itself could function independently, in let say a degraded state.

Environment

  • Hangfire Version: 1.8.22
  • Storage: Hangfire.PostgreSql 1.20.12
  • Framework: ASP.NET Core .NET8
  • Database: 18.0 - PostgreSQL Npgsql.EntityFrameworkCore.PostgreSQL 9.0.4

Current Behavior

Both AddHangfire() (during service registration) and MapHangfireDashboard() (during middleware configuration) attempt immediate database connections without retry logic. If PostgreSQL is unreachable during startup:

services.AddHangfire((provider, config) =>
{
...
   config.UsePostgreSqlStorage(c =>
   {
       c.UseNpgsqlConnection(connectionString);
   })
...
});
  1. Application throws Npgsql.NpgsqlException: Failed to connect to...
  2. Application startup fails completely
  3. No graceful degradation or retry mechanism

Expected Behavior

Hangfire should provide startup resilience options:

  1. Retry with exponential backoff during initialization
  2. Graceful degradation: allow app to start with Hangfire unavailable, let the developer decide what to do.
  3. Once the DB is available, Hangfire initialize and the dashboard is functional.

Real-World Scenario

I am aware that in a containerized environment, container will just keep restarting until the DB is available, also if the DB is down/unreachable it's for sure is a critical situation for the app functioning in general, but I also believe there are scenarios where we need to ensure the app starts gracefully even if the DB is down. I think we can do better.

Attempted Workarounds

I tried an approach to defer the hangfire registration/configuration with an ASP.NET Core BackgroundService with some try catch flow, but that won't work with MapHangfireDashboard() since it maps an endpoint.
I don't think using PrepareSchemaIfNecessary= false is the solution for this, since it introduces risk of unmatching schema and manual schema migration management.

I cannot find a solution to this issue from my perspective. IMHO startup resilience should be a builtin first-class feature rather than requiring custom workarounds. Thoughts?

ps: Coming from HangfireIO/Hangfire#2560

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions