Skip to content

[Spec Resync] 06-25-2025 #2407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .evergreen/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,23 @@ post:
- func: "upload mo artifacts"
- func: "upload test results"
- func: "cleanup"

tasks:
- name: resync_specs
commands:
- command: subprocess.exec
params:
binary: bash
include_expansions_in_env: [AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN]
args:
- .evergreen/scripts/resync-all-specs.sh
working_dir: src

buildvariants:
- name: resync_specs
display_name: "Resync Specs"
run_on: rhel80-small
cron: '0 9 * * MON'
patchable: true
tasks:
- name: resync_specs
9 changes: 6 additions & 3 deletions .evergreen/resync-specs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,12 @@ then
fi

# Ensure the JSON files are up to date.
cd $SPECS/source
make
cd -
if ! [ -n "${CI:-}" ]
then
cd $SPECS/source
make
cd -
fi
# cpjson unified-test-format/tests/invalid unified-test-format/invalid
# * param1: Path to spec tests dir in specifications repo
# * param2: Path to where the corresponding tests live in Python.
Expand Down
63 changes: 63 additions & 0 deletions .evergreen/scripts/create-pr.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/usr/bin/env bash

tools="../drivers-evergreen-tools"
git clone https://github.com/mongodb-labs/drivers-evergreen-tools.git $tools
body="$(cat "$1")"

pushd $tools/.evergreen/github_app

owner="mongodb"
repo="mongo-python-driver"

# Bootstrap the app.
echo "bootstrapping"
source utils.sh
bootstrap drivers/comment-bot

# Run the app.
source ./secrets-export.sh

# Get a github access token for the git checkout.
echo "Getting github token..."

token=$(bash ./get-access-token.sh $repo $owner)
if [ -z "${token}" ]; then
echo "Failed to get github access token!"
popd
exit 1
fi
echo "Getting github token... done."
popd

# Make the git checkout and create a new branch.
echo "Creating the git checkout..."
branch="spec-resync-"$(date '+%m-%d-%Y')

#git config user.email "167856002+mongodb-dbx-release-bot[bot]@users.noreply.github.com"
#git config user.name "mongodb-dbx-release-bot[bot]"
git remote set-url origin https://x-access-token:${token}@github.com/$owner/$repo.git
git checkout -b $branch "origin/master"
git add ./test
git apply -R .evergreen/specs.patch
git commit -am "resyncing specs test?"
echo "Creating the git checkout... done."

echo "THIS IS THE BODY"
echo "$body"
git push origin $branch
echo "{\"title\":\"[Spec Resync] $(date '+%m-%d-%Y')\",\"body\":\"$(cat "$1")\",\"head\":\"${branch}\",\"base\":\"master\"}"
resp=$(curl -L \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $token" \
-H "X-GitHub-Api-Version: 2022-11-28" \
-d "{\"title\":\"[Spec Resync] $(date '+%m-%d-%Y')\",\"body\":\"$(cat "$1")\",\"head\":\"${branch}\",\"base\":\"master\"}" \
--url https://api.github.com/repos/$owner/$repo/pulls)
echo $resp
echo $resp | jq '.html_url'
echo "Creating the PR... done."

rm -rf $tools

# use file names or reg-ex patterns
# or automate which version of the spec we support (like schema version)
59 changes: 59 additions & 0 deletions .evergreen/scripts/resync-all-specs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import os
import pathlib
import subprocess
import argparse


def resync_specs(directory: pathlib.Path, succeeded: list[str], errored: dict[str, str]) -> None:
for entry in os.scandir(directory):
if not entry.is_dir():
continue

print(entry.path)
spec_name = entry.path.split("/")[-1]
if spec_name in ["asynchronous"]:
continue
process = subprocess.run(
["bash", "./.evergreen/resync-specs.sh", spec_name],
capture_output=True,
text=True)
print(process.returncode)
if process.returncode == 0:
succeeded.append(spec_name)
else:
errored[spec_name] = process.stdout
print(process.stderr)

def write_summary(succeeded: list[str], errored: dict[str, str]) -> None:
pr_body = ""
if len(succeeded) > 0:
pr_body += "The following specs were changed:\n- "
process = subprocess.run(
["git diff --name-only | awk -F'/' '{print $2}' | sort | uniq"],
shell=True,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified an issue in your code:
Found 'subprocess' function 'run' with 'shell=True'. This is dangerous because this call will spawn the command using a shell process. Doing so propagates current shell settings and variables, which makes it much easier for a malicious actor to execute commands. Use 'shell=False' instead.

To resolve this comment:

💡 Follow autofix suggestion

Suggested change
shell=True,
shell=False,
View step-by-step instructions
  1. Change the subprocess.run call to use shell=False (which is the default), and provide the command as a list rather than a string.
  2. Split the long shell command (git diff --name-only | awk -F'/' '{print $2}' | sort | uniq) into individual arguments so you can pass it as a list, or use Python modules like subprocess.PIPE and multiple subprocess calls to replicate the shell pipeline.
  3. For your example, you can achieve similar results in Python by using multiple subprocess runs:
    • Run git diff --name-only
    • Pass the output to awk via subprocess, or process it in Python
    • Sort and deduplicate results in Python
  4. Replace the vulnerable code:
    process = subprocess.run(
        ["git diff --name-only | awk -F'/' '{print $2}' | sort | uniq"],
        shell=True,
        capture_output=True,
        text=True)
    
    With safer code like:
    process1 = subprocess.run(["git", "diff", "--name-only"], capture_output=True, text=True)
    lines = [line.strip().split('/')[1] for line in process1.stdout.strip().splitlines() if '/' in line]
    unique_sorted = sorted(set(lines))
    process_stdout = '\n'.join(unique_sorted) + '\n' if unique_sorted else ''
    pr_body += process_stdout.replace("\n", "\n- ")
    

Alternatively, if you must use shell pipelines, make absolutely sure the arguments are constant and trusted, but avoid shell=True unless absolutely necessary. Using direct argument lists is safer and avoids shell injection risks.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by subprocess-shell-true.

You can view more details about this finding in the Semgrep AppSec Platform.

capture_output=True,
text=True)
pr_body += process.stdout.strip().replace("\n", "\n- ")
pr_body += "\n"
if len(errored) > 0:
pr_body += "\n\nThe following spec syncs encountered errors:"
for k, v in errored.items():
pr_body += f"\n- {k}\n```{v}\n```"

if pr_body != "":
with open("spec_sync.txt", "w") as f:
# replacements made for to be json
f.write(pr_body.replace("\n", "\\n").replace("\t", "\\t"))

def main():
directory = pathlib.Path("./test")
succeeded: list[str] = []
errored: dict[str, str] = {}
resync_specs(directory, succeeded, errored)
write_summary(succeeded, errored)

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Python Script to resync all specs and generate summary for PR.")
parser.add_argument("filename", help="Name of file for the summary to be written into.")
args = parser.parse_args()
main()
40 changes: 40 additions & 0 deletions .evergreen/scripts/resync-all-specs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/usr/bin/env bash
# Run spec syncing script and create PR

# SETUP
SPEC_DEST="$(realpath -s "./test")"
SRC_URL="https://github.com/mongodb/specifications.git"
# needs to be set for resunc-specs.sh
SPEC_SRC="$(realpath -s "../specifications")"
SCRIPT="$(realpath -s "./.evergreen/resync-specs.sh")"
BRANCH_NAME="spec-resync-"$(date '+%m-%d-%Y')

# Clone the spec repo if the directory does not exist
if [[ ! -d $SPEC_SRC ]]; then
git clone $SRC_URL $SPEC_SRC
if [[ $? -ne 0 ]]; then
echo "Error: Failed to clone repository."
exit 1
fi
fi

# Set environment variable to the cloned spec repo for resync-specs.sh
export MDB_SPECS="$SPEC_SRC"

# Check that resync-specs.sh exists and is executable
if [[ ! -x $SCRIPT ]]; then
echo "Error: $SCRIPT not found or is not executable."
exit 1
fi

PR_DESC="spec_sync.txt"

# run python script that actually does all the resyncing
/opt/devtools/bin/python3.11 ./.evergreen/scripts/resync-all-specs.py "$PR_DESC"


if [[ -f $PR_DESC ]]; then
# changes were made -> call scrypt to create PR for us
.evergreen/scripts/create-pr.sh "$PR_DESC"
rm "$PR_DESC"
fi
Loading
Loading