-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Fix azpysdk Sphinx Check and Use in CI #44144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes the Sphinx documentation check in the Azure SDK for Python by refactoring how source distributions are handled during documentation generation. The changes move from working directly with installed packages to extracting and working with the sdist archive contents in an "unzipped" directory structure.
Key changes:
- Adds extraction utilities to unzip sdist archives (
.zipor.tar.gz) before running Sphinx - Updates path resolution throughout the Sphinx workflow to use the extracted "unzipped" directory structure
- Migrates CI pipeline from
dispatch_tox.pywith--toxenv=sphinxtodispatch_checks.pywith--checks="sphinx"
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| eng/tools/azure-sdk-tools/azpysdk/sphinx.py | Adds archive extraction functions and updates Sphinx build workflow to work with extracted sdist contents instead of installed packages |
| eng/pipelines/templates/steps/build-extended-artifacts.yml | Updates CI to use new dispatch_checks.py script with --checks parameter instead of tox-based approach |
| generate_mgmt_script = os.path.join(REPO_ROOT, "doc/sphinx/generate_doc.py") | ||
|
|
||
|
|
||
| def unzip_sdist_to_directory(containing_folder: str) -> str: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for function unzip_sdist_to_directory. This function is a key helper that handles extraction of distribution archives. Add a docstring describing its purpose, parameters, return value, and potential exceptions (e.g., IndexError if no archives are found).
| def unzip_sdist_to_directory(containing_folder: str) -> str: | |
| def unzip_sdist_to_directory(containing_folder: str) -> str: | |
| """ | |
| Extracts a source distribution archive (.zip or .tar.gz) from the specified folder. | |
| Searches for a .zip file first; if none is found, searches for a .tar.gz file. | |
| Extracts the first archive found into the containing folder and returns the path to the extracted directory. | |
| Parameters | |
| ---------- | |
| containing_folder : str | |
| The path to the folder containing the source distribution archive. | |
| Returns | |
| ------- | |
| str | |
| The path to the directory where the archive was extracted. | |
| Raises | |
| ------ | |
| IndexError | |
| If no .zip or .tar.gz archive is found in the specified folder. | |
| """ |
| return unzip_file_to_directory(tars[0], containing_folder) | ||
|
|
||
|
|
||
| def unzip_file_to_directory(path_to_zip_file: str, extract_location: str) -> str: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for function unzip_file_to_directory. This function handles both zip and tar.gz extraction with different logic paths. Add a docstring describing its purpose, parameters, return value, and the different handling for zip vs tar.gz files.
| def unzip_file_to_directory(path_to_zip_file: str, extract_location: str) -> str: | |
| def unzip_file_to_directory(path_to_zip_file: str, extract_location: str) -> str: | |
| """ | |
| Extracts an archive file (either .zip or .tar.gz) to the specified directory. | |
| If the file is a .zip archive, it is extracted using the zipfile module. | |
| If the file is a .tar.gz archive, it is extracted using the tarfile module. | |
| Parameters | |
| ---------- | |
| path_to_zip_file : str | |
| The path to the archive file to extract. Supported formats: .zip, .tar.gz. | |
| extract_location : str | |
| The directory where the archive contents will be extracted. | |
| Returns | |
| ------- | |
| str | |
| The path to the directory containing the extracted files. For .zip files, | |
| this is the base name of the zip file (without extension) inside the extract location. | |
| For .tar.gz files, this is the base name of the tar.gz file (without extension) inside the extract location. | |
| """ |
| return os.path.join(extract_location, extracted_dir) | ||
|
|
||
|
|
||
| def move_and_rename(source_location: str) -> str: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docstring for function move_and_rename. This function performs a destructive operation (removes existing directory if present). Add a docstring describing its purpose, parameters, return value, and the side effect of removing existing directories.
| def move_and_rename(source_location: str) -> str: | |
| def move_and_rename(source_location: str) -> str: | |
| """ | |
| Moves and renames the directory at `source_location` to a new directory named 'unzipped' | |
| in the same parent directory. If a directory named 'unzipped' already exists at the target | |
| location, it will be removed along with all its contents before the move operation. | |
| Parameters | |
| ---------- | |
| source_location : str | |
| The path to the directory to be moved and renamed. | |
| Returns | |
| ------- | |
| str | |
| The path to the new location of the moved directory. | |
| Side Effects | |
| ------------ | |
| If a directory named 'unzipped' already exists at the target location, it will be deleted | |
| along with all its contents before the move operation. This is a destructive operation. | |
| """ |
| os.path.join(target_dir, "unzipped/test*"), # This argument and below are "exclude" directory arguments | ||
| os.path.join(target_dir, "unzipped/example*"), | ||
| os.path.join(target_dir, "unzipped/sample*"), | ||
| os.path.join(target_dir, "unzipped/setup.py"), |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] conftest.py was removed from the exclusion list. If conftest.py files exist in packages and should not be included in Sphinx documentation, this could cause them to be processed. Verify this change is intentional and that conftest.py files don't need to be excluded.
| os.path.join(target_dir, "unzipped/setup.py"), | |
| os.path.join(target_dir, "unzipped/setup.py"), | |
| os.path.join(target_dir, "unzipped/conftest.py"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely add this back in if you didn't remove it for a reason.
| return unzip_file_to_directory(zips[0], containing_folder) | ||
| else: | ||
| tars = glob.glob(os.path.join(containing_folder, "*.tar.gz")) | ||
| return unzip_file_to_directory(tars[0], containing_folder) |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for empty tar list. If no .zip or .tar.gz files are found in containing_folder, this function will raise an IndexError when trying to access tars[0]. Consider adding validation to check if files exist before attempting to access them.
| return unzip_file_to_directory(tars[0], containing_folder) | |
| if tars: | |
| return unzip_file_to_directory(tars[0], containing_folder) | |
| else: | |
| raise FileNotFoundError( | |
| f"No .zip or .tar.gz files found in directory: {containing_folder}" | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, it SHOULD NOT HAPPEN but sometimes it does due to reasons as-yet-unknown and it would be better to get the nice error.
| extracted_dir = os.path.basename(os.path.splitext(path_to_zip_file)[0]) | ||
| return os.path.join(extract_location, extracted_dir) | ||
| else: | ||
| with tarfile.open(path_to_zip_file) as tar_ref: |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential path traversal security issue with tarfile extraction. When extracting tar archives without validation, malicious archives can write files outside the intended directory. Use tarfile.data_filter (Python 3.12+) or manually validate member paths to prevent directory traversal attacks. See PEP 706 for details.
| with tarfile.open(path_to_zip_file) as tar_ref: | |
| with tarfile.open(path_to_zip_file) as tar_ref: | |
| # Secure extraction to prevent path traversal | |
| for member in tar_ref.getmembers(): | |
| member_path = os.path.join(extract_location, member.name) | |
| abs_extract_location = os.path.abspath(extract_location) | |
| abs_member_path = os.path.abspath(member_path) | |
| if not abs_member_path.startswith(abs_extract_location + os.sep): | |
| raise Exception(f"Attempted Path Traversal in Tar File: {member.name}") |
|
|
||
|
|
||
| def move_and_rename(source_location: str) -> str: | ||
| new_location = os.path.join(os.path.dirname(source_location), "unzipped") |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The string literal "unzipped" is repeated 11 times across the file. Consider defining it as a module-level constant (e.g., UNZIPPED_DIR_NAME = "unzipped") to improve maintainability and reduce the risk of typos when the directory name needs to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree with this.
| "$(TargetingString)" | ||
| --service="${{ parameters.ServiceDirectory }}" | ||
| --toxenv=sphinx | ||
| --checks="sphinx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We gotta add the env: for TOX_PIP_IMPL: 'uv' So that this is screaming fast.
| site_folder = os.path.join(package_dir, "website") | ||
| doc_folder = os.path.join(staging_directory, "docgen") | ||
|
|
||
| if should_build_docs(package_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move this should_build_docs to a precheck above this entire section of code.
We check multiple places whether or not we should build docs, and if we shouldn't we keep having to check this. I recognize you probably pulled this from previous code state, but if we're here we might as well improve it.
| logger.info("Skipping sphinx prep for {}".format(package_name)) | ||
|
|
||
| # run apidoc | ||
| output_dir = os.path.join(staging_directory, "unzipped/docgen") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it that we run mgmt_apidoc against the output_dir, but sphinx_apidoc against the staging_directory?
What's the key difference?
scbedd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to have the full comments that copilot is adding, but I do appreciate the doc-comments that it's suggesting.
Feel free to maintain the simpler :param: documentation over the gigantic suggestion that it's offering. See the rest of the repo for examples.
sphinx code changes:
tox_helper_tasks, which is a file that can probably be deleted after the migration is complete)azure-core,azure-storage-blob,azure-mgmt-storageCI: