Skip to content

feat(idempotency): Remove deadlock after lambda handler timeout #1198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

michaelbrewer
Copy link
Contributor

@michaelbrewer michaelbrewer commented May 1, 2022

Issue number:

Summary

Currently if a lambda times-out the idempotency dynamodb item is left in the 'INPROGRESS' status forever AND subsequent lambda retries fail with IdempotencyAlreadyInProgressError. And thus blocking the transaction from ever completing. As a customer you would need to manually delete this pending record to retry.

NOTE: This PR solves this for a AWS Lambda function handler with a idempotent decorator, but not yet for idempotent_function as we don't have a direct handle to the Lambda context.

Changes

Please provide a summary of what's being changed

Initial draft on an option to clean up on function timeout

Changes:

  • add config flag called function_timeout_clean_up which will cleanup in complete record in the previous invoke times out
  • save_inprogress sets a function_timeout if function_timeout_clean_up is true
  • _put_record now includes an OR #function_timeout < :now to allow for overriding after function timeout
  • _update_record resets function_timeout back to None after completing a function
  • Update tests to include changes in behavior

User experience

Please share what the user experience looks like before and after this change

Add an optional flag function_timeout_clean_up to clean up on retries if the function timed out

Example usage of a lambda that times out before completing ie: time.sleep(3)

import time
from os import environ
from uuid import uuid4

from aws_lambda_powertools.utilities.idempotency import (
    DynamoDBPersistenceLayer,
    IdempotencyConfig,
    idempotent,
)

DYNAMODB_TABLE = environ["DYNAMODB_TABLE"]
config = IdempotencyConfig(function_timeout_clean_up=True)
persistence_layer = DynamoDBPersistenceLayer(table_name=DYNAMODB_TABLE)


@idempotent(persistence_store=persistence_layer, config=config)
def handler(event, context):
    time.sleep(3)
    uuid_value: str = str(uuid4())
    return {"message": uuid_value}

Example sam template with this PRs powertools as a layer and a timeout of 2 seconds

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Testing idempotency timeouts
Globals:
    Function:
        Timeout: 2

Resources:
  IdempotencyTable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        -   AttributeName: id
            AttributeType: S
      KeySchema:
        -   AttributeName: id
            KeyType: HASH
      TimeToLiveSpecification:
        AttributeName: expiration
        Enabled: true
      BillingMode: PAY_PER_REQUEST

  IdempotencyFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: IdempotencyFunction
      CodeUri: src/
      Handler: app.handler
      Runtime: python3.9
      MemorySize: 512
      Layers: 
        - !Ref PowertoolsLayer
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref IdempotencyTable
      Environment:
        Variables:
          LOG_LEVEL: INFO
          POWERTOOLS_SERVICE_NAME: example
          DYNAMODB_TABLE: !Ref IdempotencyTable

  PowertoolsLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: MyLambdaLayer
      ContentUri: layer/
      CompatibleRuntimes:
        - python3.9

First call will timeout writing the function_timeout record of when to expire
Second call would override / cleaning up the previous call

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

Changes:
- Initial draft on an option to clean up on function timeout

close aws-powertools#1038
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 1, 2022
@github-actions github-actions bot added the feature New feature or functionality label May 1, 2022
@codecov-commenter
Copy link

codecov-commenter commented May 1, 2022

Codecov Report

Merging #1198 (0928a6d) into develop (2302099) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff            @@
##           develop    #1198   +/-   ##
========================================
  Coverage    99.88%   99.88%           
========================================
  Files          119      119           
  Lines         5423     5436   +13     
  Branches       618      619    +1     
========================================
+ Hits          5417     5430   +13     
  Misses           2        2           
  Partials         4        4           
Impacted Files Coverage Δ
...ws_lambda_powertools/utilities/idempotency/base.py 100.00% <100.00%> (ø)
..._lambda_powertools/utilities/idempotency/config.py 100.00% <100.00%> (ø)
...wertools/utilities/idempotency/persistence/base.py 99.39% <100.00%> (+0.02%) ⬆️
...ools/utilities/idempotency/persistence/dynamodb.py 98.71% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bd8722...0928a6d. Read the comment docs.

@michaelbrewer michaelbrewer marked this pull request as ready for review May 4, 2022 19:32
@michaelbrewer michaelbrewer changed the title feat(idempotency): Clean up on lambda timeout feat(idempotency): Clean up after lambda timeout May 4, 2022
@michaelbrewer
Copy link
Contributor Author

@heitorlessa - i added a proof of concept repo for this repo

https://github.com/michaelbrewer/idem-tmp

@michaelbrewer michaelbrewer changed the title feat(idempotency): Clean up after lambda timeout feat(idempotency): Clean up after lambda handler timeout May 5, 2022
@sthulb
Copy link
Contributor

sthulb commented May 12, 2022

In an effort to prevent you from wasting time, you should wait for issues to prioritised, this won't be merged for a long time until we're ready to add new features.

@michaelbrewer
Copy link
Contributor Author

michaelbrewer commented May 12, 2022

In an effort to prevent you from wasting time, you should wait for issues to prioritised, this won't be merged for a long time until we're ready to add new features.

See issue #1038 and comment #1038 (comment)

Screen Shot 2022-05-13 at 4 00 36 PM

@michaelbrewer
Copy link
Contributor Author

michaelbrewer commented May 13, 2022

Alternative names for function_timeout_clean_up params could be:

  • clean_up_after_function_timeout: As the actual clean up the in progress record happens after the function timeout during the retry attempt.

@michaelbrewer michaelbrewer changed the title feat(idempotency): Clean up after lambda handler timeout feat(idempotency): Remove deadlock after lambda handler timeout Jun 13, 2022
@sthulb sthulb closed this Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or functionality size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tests utilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants