Traditional methods of performing address enrichment on geospatial datasets can be expensive and time consuming.
Using Amazon Location Service with AWS Step Functions for orchestration and with Amazon DynamoDB for caching in a serverless data processing pipeline, you may achieve significant performance improvements and cost savings on address enrichment jobs that use geospatial data.
This sample is an evolution to the already available sample, which only uses Lambda functions (can be found here).
Some of the improvements in this project includes:
- Using AWS Step Functions for Orchestration
- Using Amazon DynamoDB as a Naive cache to store location results, which helps improve performance and optimize costs
The repository contains a SAM tempalte for deploying a Serverless Address Enrichment pipeline using:
- Amazon S3 (for object storage),
- AWS Lambda (for serverless compute),
- AWS Step Functions (for Orchestration),
- Amazon DynamoDB (for Caching)
- Amazon Location Service (for Geocoding/Reverse Geocoding)
It also uses sample data sourced from publicly available datasets that you can deploy and use to test the application.
This project addresses the concerns from the customers, how they can improve the performance of their application and at the same time optimize their costs.
- The Scatter Lambda function takes a data set from the S3 bucket labeled input and breaks it into equal sized shards.
- The Process Lambda function takes each shard from the pre-processed bucket and performs Address Enrichment in parallel calling the Amazon Location Service Places API and storing
- The Gather Lambda function takes each shard from the post-processed bucket and appends them into a complete dataset with additional address information.
To use the SAM CLI, you need the following tools:
- AWS account
- AWS SAM CLI - Install the SAM CLI
- Python 3.9 or later - download the latest of version of python
- An AWS Identity and Access Managment role with appropriate access
- template.yaml: Contains the AWS SAM template that defines you applications AWS resources, which includes a Place Index for Amazon Location Service
- statemachine/location_service_scatter_gather.asl.yaml: Contains the Step Functions ASL definition
- functions/scatter/: Contains the Lambda handler logic behind the scatter function and its requirements
- functions/process/: Contains the Lambda handler logic for the processor function which calls the Amazon Location Service Places API to perform address enrichment
- functions/gather/: Contains the Lambda handler logic for the gather function which appends all of processed data into a complete dataset
- tests/: TBD - Needs to contain test cases (Unit and Integration Tests)
- Use
git clone https://github.com/aws-samples/address-enrichment-and-caching-using-stepfunctions
to clone the repository to your environment where AWS SAM and python are installed. - Use
cd address-enrichment-and-caching-using-stepfunctions
to change into the project directory containing the template.yaml file SAM uses to build your application. - If you have Docker installed, you can use
sam build --use-container
, otherwise, you can usesam build
to build your application using SAM. You should see:
Build Succeeded
Built Artifacts : .aws-sam/build
Built Template : .aws-sam/build/template.yaml
Commands you can use next
=========================
[*] Invoke Function: sam local invoke
[*] Test Function in the Cloud: sam sync --stack-name {stack-name} --watch
[*] Deploy: sam deploy --guided
- Use
sam deploy --guided
to deploy the application to your AWS account. Enter responses based on your environment:
Configuring SAM deploy
======================
Looking for config file [samconfig.toml] : Not found
Setting default arguments for 'sam deploy'
=========================================
Stack Name [sam-app]: address-enrichment
AWS Region [us-west-2]: us-east-1
#Shows you resources changes to be deployed and require a 'Y' to initiate deploy
Confirm changes before deploy [y/N]: Y
#SAM needs permission to be able to create roles to connect to the resources in your template
Allow SAM CLI IAM role creation [Y/n]: Y
#Preserves the state of previously provisioned resources when an operation fails
Disable rollback [y/N]: N
Save arguments to configuration file [Y/n]: Y
SAM configuration file [samconfig.toml]:
SAM configuration environment [default]:
Download the below samples locally, unzip the files, and upload the CSV to your input S3 bucket to trigger the adddress enrichment pipeline.
Geocoding: City of Hartford, CT Business Listing Dataset
Reverse Geocoding: Miami Housing Dataset
In order to avoid incurring any charges, this section talks about cleaning up the AWS resources, which got created when following through this sample.
Make sure you empty
the following S3 buckets before deleting the Cloud Formation Stack (as the deletion will fail for non-empty buckets):
- input-
stack-name
-aws-region
-aws-accountnumber
- raw-
stack-name
-aws-region
-aws-accountnumber
- processed-
stack-name
-aws-region
-aws-accountnumber
- destination-
stack-name
-aws-region
-aws-accountnumber
To delete the resources you created as part of this sample, you can run sam delete
:
sam delete
Are you sure you want to delete the stack address-enrichment in the region us-east-1 ? [y/N]: y
Are you sure you want to delete the folder address-enrichment in S3 which contains the artifacts? [y/N]: y
- Deleting S3 object with key address-enrichment/c2710045fb8c4c4d77e47fba2f9754e4
- Deleting S3 object with key address-enrichment/c5ca75d7c52419e4077a3c030d76d812
- Deleting S3 object with key address-enrichment/04c2cdceeee06f8998eccf77fc6ffb9b
- Deleting S3 object with key address-enrichment/f1e2091b2a434fd87f023b603e23fe10
- Deleting S3 object with key address-enrichment/5a46e427cf72552a09e714f3a5c16461.template
- Deleting Cloudformation stack address-enrichment
Deleted successfully
Alternatively, you can delete the AWS CloudFormation Stack by logging in to your AWS Console and navigating to AWS CloudFormation service. Then select Stacks. After selecting the Stack you want to delete, click on Delete button on top right.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.