Skip to content

aws s3 sync downloading unchanged files. #7228

@MrJoy

Description

@MrJoy

Describe the bug

I have a maintenance script I run to keep a local copy of billing & usage data for my personal AWS account. It's identifying almost every file as changed, on every run even though most of the files haven't been modified in years.

Expected Behavior

Only changed files -- in this case, files representing the current billing period -- should be downloaded.

Current Behavior

Of 6,279 files that do not represent the current billing period, it's consistently re-downloading 5,831 of them. The files it downloads are, byte-for-byte identical to the existing ones. I spot-checked one of the files, and aws s3 ls reports the exact same size and timestamp as ls does.

Reported by aws s3 sync:

download: s3://mrjoy-billing-data//cur//billing_and_usage/20210101-20210201/20210122T235314Z/billing_and_usage-00001.csv.gz to ../personal/Finance/AWS_Billing_Data/cur/billing_and_usage/20210101-20210201/20210122T235314Z/billing_and_usage-00001.csv.gz

Reported by aws s3 ls:

% aws-vault exec mrjoy -- aws s3 ls s3://mrjoy-billing-data//cur//billing_and_usage/20210101-20210201/20210122T235314Z/billing_and_usage-00001.csv.gz
2021-01-22 15:53:24     296522 billing_and_usage-00001.csv.gz

Reported by ls:

% ls -laD "%Y-%m-%d %H:%M:%S" ~/personal/Finance/AWS_Billing_Data/cur/billing_and_usage/20210101-20210201/20210122T235314Z/billing_and_usage-00001.csv.gz
-rw-r--r--  1 jonathonfrisby  staff  296522 2021-01-22 15:53:24 /Users/jonathonfrisby/personal/Finance/AWS_Billing_Data/cur/billing_and_usage/20210101-20210201/20210122T235314Z/billing_and_usage-00001.csv.gz

The post-fetch commit in all cases shows diffs for the files in the current billing period (as would be expected), and no changes to any of the other files that aws s3 sync reports as being downloaded.

All told, aws s3 sync appears to be downloading around 700MB of files on each run that it shouldn't be.

Reproduction Steps

The relevant portion of my script is:

#!/bin/bash
IFS=$'\n\t'
set -euo pipefail

(
  cd ~/personal
  git add .
  git commit --all --allow-empty -m "AWS bill snapshot, pre-fetch..."
  aws-vault exec mrjoy -- aws s3 sync s3://mrjoy-billing-data/ ~/personal/Finance/AWS_Billing_Data/
  git add .
  git commit --all --allow-empty -m "AWS bill snapshot, post-fetch..."
)

The data in the bucket is written by AWS itself.

Possible Solution

No response

Additional Information/Context

No response

CLI version used

2.7.26

Environment details (OS name and version, etc.)

macOS 12.5.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    p2This is a standard priority issues3s3sync

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions