restart notification and error posting #37

PaulJKathmann · 2025-08-25T22:08:55Z

Before this PR

If posting the job result fails 5 times an exception gets raised but the forwarder doesn't know about it, since it never gets the message from the user code container.
If the user code crashes the forwarder will think the user code is still running until the user code posts a result for the same jobId. However, after restarting the user code will not know about the previously failed job so it will never report on it. This way a node/module might be blocked from receiving new requests after restarting:

https://github.palantir.build/foundry/interactive-infra/issues/9237

After this PR

If the job posting fails 5 times (e.g. result too large) then the client tries 5 more times to post just a simple error message.
We inform the forwarder whenever a node starts up so it will remove all existing jobs related to it that it thinks are still running.

Possible downsides?

Are Docs needed?

PaulJKathmann added 7 commits August 25, 2025 17:41

restart notification and error posting

de79457

format

8ae9d19

feature flag

fad7eff

format

ec59e8a

fix errors

6a8a16a

fix raw type warning

b3983a9

format

a42de6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

restart notification and error posting #37

restart notification and error posting #37

Uh oh!

PaulJKathmann commented Aug 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

restart notification and error posting #37

Are you sure you want to change the base?

restart notification and error posting #37

Uh oh!

Conversation

PaulJKathmann commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before this PR

After this PR

Possible downsides?

Are Docs needed?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PaulJKathmann commented Aug 25, 2025 •

edited

Loading