Skip to content

Commit f73b7a7

Browse files
committed
Auto merge of #1803 - sgrif:sg-environment-variable-for-background-timeout, r=jtgeibel
Configure the background job timeout via an environment variable An incident was caused by #1798. There is a description below if you're interested, but this PR does not fix the problem. However, the band-aid to get things running again fix is to increase the timeout for the job runner. When responding to an incident, waiting for a full rebuild to change this is not acceptable. This replaces the hard-coded value with an environment variable so we can quickly change this on the fly in the future. Description of the actual problem that this does not fix -- The problem was that the `update_downloads` job takes longer than the timeout we had set for jobs to begin running. So swirl would start the `update_downloads` job, try to spawn another worker, and then would time out hearing from that worker whether it got a job or not. So we would crash the process, the job would be incomplete, and we'd just start the whole thing over again. There's several real fixes for this, and I will open a PR that is some combination of all of them. Ultimately each of these fixes just increase the number of slow concurrent jobs that can be run before we hit the timeout and the problem re-appears, but that's fundamentally always going to be the case... If we are getting more jobs than we can process, we do need to get paged so we can remedy the situation. Still, any or all of these will be the "real" fix: - Increasing the number of concurrent jobs - Increasing the timeout - Re-building the runner before crashing - The reason this would fix the issue is that by not crashing the process, we give the spawned threads a chance to finish. We do still want to *eventually* crash the process, as there might be something inherent to this process or machine preventing the jobs from running, but starting with a new thread/connection pool a few times gives things a better chance to recover on their own.
2 parents 952b031 + 7abd57f commit f73b7a7

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

src/bin/background-worker.rs

+6-1
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,11 @@ fn main() {
3131
_ => None,
3232
};
3333

34+
let job_start_timeout = dotenv::var("BACKGROUND_JOB_TIMEOUT")
35+
.unwrap_or_else(|_| "10".into())
36+
.parse()
37+
.expect("Invalid value for `BACKGROUND_JOB_TIMEOUT`");
38+
3439
println!("Cloning index");
3540

3641
let repository = Repository::open(&config.index_location).expect("Failed to clone index");
@@ -45,7 +50,7 @@ fn main() {
4550

4651
let runner = swirl::Runner::builder(db_pool, environment)
4752
.thread_count(1)
48-
.job_start_timeout(Duration::from_secs(10))
53+
.job_start_timeout(Duration::from_secs(job_start_timeout))
4954
.build();
5055

5156
println!("Runner booted, running jobs");

0 commit comments

Comments
 (0)