Skip to content

Conversation

@cgillum
Copy link
Member

@cgillum cgillum commented Dec 15, 2023

Overview

This is a fairly "chunky" PR which refactors how we do orchestration state management. The motivation for this change is to resolve the state store limitation in Dapr Workflow.

With this change, the durable task engine now supports saving orchestration history in multiple chunks rather than all-at-once. This allows it to be compatible with state stores that have:

  • limits on the number of keys that can be saved in a single transaction
  • limits on the number of bytes in a single transaction

The changes are designed in a way to ensure that machine failures that occur in between saving "chunks" can be automatically recovered, with no data loss.

Summary of all changes

  • Major refactoring of OrchestrationRuntimeState to support history chunking
  • Refactor worker options to support orchestration worker-specific settings
  • Added payload size limit configuraiton
  • Several debug logging improvements
  • Improved handling of fatal errors
  • More filtering of duplicate and unnecessary events
  • Added new parameter to Backend.CompleteOrchestrationWorkItem

- Refactor worker options to support orchestration worker-specific settings
- Added payload size limit configuraiton
- Several debug logging improvements
- Improved handling of fatal errors
- Major refactoring of OrchestrationRuntimeState to support history chunking
- More filtering of duplicate and unnecessary events
- Added new parameter to Backend.CompleteOrchestrationWorkItem
@cgillum cgillum requested a review from kaibocai December 15, 2023 15:15
kaibocai
kaibocai previously approved these changes Dec 18, 2023
Copy link
Member

@kaibocai kaibocai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@cgillum
Copy link
Member Author

cgillum commented Dec 18, 2023

I will merge this after I validate that it works correctly with the Dapr state stores (like Cosmos DB, which is one of the main motivations for this change).

// mustEmbedUnimplementedTaskHubSidecarServiceServer implements protos.TaskHubSidecarServiceServer
//
//lint:ignore U1000 because this is a required gRPC method
func (grpcExecutor) mustEmbedUnimplementedTaskHubSidecarServiceServer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What gRPC wants us to do here is to change type grpcExecutor struct and add an embed for unimplementedTaskHubSidecarServiceServer. We shouldn't have to implement this method

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means, unfortunately. Can you provide an example of the code change I can make that will allow me to delete this method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing this method should be enough.

grpcExecutor already embeds unimplementedTaskHubSidecarServiceServer:

protos.UnimplementedTaskHubSidecarServiceServer

I tried doing it locally but right now this PR isn't building (at least not for me?) with some other really weird issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants