Skip to content

Conversation

@JennyPng JennyPng marked this pull request as ready for review November 24, 2025 22:31
Copilot AI review requested due to automatic review settings November 24, 2025 22:31
@JennyPng JennyPng requested a review from benbp as a code owner November 24, 2025 22:31
Copilot finished reviewing on behalf of JennyPng November 24, 2025 22:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the Sphinx documentation check in the Azure SDK for Python by refactoring how source distributions are handled during documentation generation. The changes move from working directly with installed packages to extracting and working with the sdist archive contents in an "unzipped" directory structure.

Key changes:

  • Adds extraction utilities to unzip sdist archives (.zip or .tar.gz) before running Sphinx
  • Updates path resolution throughout the Sphinx workflow to use the extracted "unzipped" directory structure
  • Migrates CI pipeline from dispatch_tox.py with --toxenv=sphinx to dispatch_checks.py with --checks="sphinx"

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
eng/tools/azure-sdk-tools/azpysdk/sphinx.py Adds archive extraction functions and updates Sphinx build workflow to work with extracted sdist contents instead of installed packages
eng/pipelines/templates/steps/build-extended-artifacts.yml Updates CI to use new dispatch_checks.py script with --checks parameter instead of tox-based approach

generate_mgmt_script = os.path.join(REPO_ROOT, "doc/sphinx/generate_doc.py")


def unzip_sdist_to_directory(containing_folder: str) -> str:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for function unzip_sdist_to_directory. This function is a key helper that handles extraction of distribution archives. Add a docstring describing its purpose, parameters, return value, and potential exceptions (e.g., IndexError if no archives are found).

Suggested change
def unzip_sdist_to_directory(containing_folder: str) -> str:
def unzip_sdist_to_directory(containing_folder: str) -> str:
"""
Extracts a source distribution archive (.zip or .tar.gz) from the specified folder.
Searches for a .zip file first; if none is found, searches for a .tar.gz file.
Extracts the first archive found into the containing folder and returns the path to the extracted directory.
Parameters
----------
containing_folder : str
The path to the folder containing the source distribution archive.
Returns
-------
str
The path to the directory where the archive was extracted.
Raises
------
IndexError
If no .zip or .tar.gz archive is found in the specified folder.
"""

Copilot uses AI. Check for mistakes.
return unzip_file_to_directory(tars[0], containing_folder)


def unzip_file_to_directory(path_to_zip_file: str, extract_location: str) -> str:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for function unzip_file_to_directory. This function handles both zip and tar.gz extraction with different logic paths. Add a docstring describing its purpose, parameters, return value, and the different handling for zip vs tar.gz files.

Suggested change
def unzip_file_to_directory(path_to_zip_file: str, extract_location: str) -> str:
def unzip_file_to_directory(path_to_zip_file: str, extract_location: str) -> str:
"""
Extracts an archive file (either .zip or .tar.gz) to the specified directory.
If the file is a .zip archive, it is extracted using the zipfile module.
If the file is a .tar.gz archive, it is extracted using the tarfile module.
Parameters
----------
path_to_zip_file : str
The path to the archive file to extract. Supported formats: .zip, .tar.gz.
extract_location : str
The directory where the archive contents will be extracted.
Returns
-------
str
The path to the directory containing the extracted files. For .zip files,
this is the base name of the zip file (without extension) inside the extract location.
For .tar.gz files, this is the base name of the tar.gz file (without extension) inside the extract location.
"""

Copilot uses AI. Check for mistakes.
return os.path.join(extract_location, extracted_dir)


def move_and_rename(source_location: str) -> str:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for function move_and_rename. This function performs a destructive operation (removes existing directory if present). Add a docstring describing its purpose, parameters, return value, and the side effect of removing existing directories.

Suggested change
def move_and_rename(source_location: str) -> str:
def move_and_rename(source_location: str) -> str:
"""
Moves and renames the directory at `source_location` to a new directory named 'unzipped'
in the same parent directory. If a directory named 'unzipped' already exists at the target
location, it will be removed along with all its contents before the move operation.
Parameters
----------
source_location : str
The path to the directory to be moved and renamed.
Returns
-------
str
The path to the new location of the moved directory.
Side Effects
------------
If a directory named 'unzipped' already exists at the target location, it will be deleted
along with all its contents before the move operation. This is a destructive operation.
"""

Copilot uses AI. Check for mistakes.
os.path.join(target_dir, "unzipped/test*"), # This argument and below are "exclude" directory arguments
os.path.join(target_dir, "unzipped/example*"),
os.path.join(target_dir, "unzipped/sample*"),
os.path.join(target_dir, "unzipped/setup.py"),
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] conftest.py was removed from the exclusion list. If conftest.py files exist in packages and should not be included in Sphinx documentation, this could cause them to be processed. Verify this change is intentional and that conftest.py files don't need to be excluded.

Suggested change
os.path.join(target_dir, "unzipped/setup.py"),
os.path.join(target_dir, "unzipped/setup.py"),
os.path.join(target_dir, "unzipped/conftest.py"),

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely add this back in if you didn't remove it for a reason.

return unzip_file_to_directory(zips[0], containing_folder)
else:
tars = glob.glob(os.path.join(containing_folder, "*.tar.gz"))
return unzip_file_to_directory(tars[0], containing_folder)
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for empty tar list. If no .zip or .tar.gz files are found in containing_folder, this function will raise an IndexError when trying to access tars[0]. Consider adding validation to check if files exist before attempting to access them.

Suggested change
return unzip_file_to_directory(tars[0], containing_folder)
if tars:
return unzip_file_to_directory(tars[0], containing_folder)
else:
raise FileNotFoundError(
f"No .zip or .tar.gz files found in directory: {containing_folder}"
)

Copilot uses AI. Check for mistakes.
Copy link
Member

@scbedd scbedd Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this, it SHOULD NOT HAPPEN but sometimes it does due to reasons as-yet-unknown and it would be better to get the nice error.

extracted_dir = os.path.basename(os.path.splitext(path_to_zip_file)[0])
return os.path.join(extract_location, extracted_dir)
else:
with tarfile.open(path_to_zip_file) as tar_ref:
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential path traversal security issue with tarfile extraction. When extracting tar archives without validation, malicious archives can write files outside the intended directory. Use tarfile.data_filter (Python 3.12+) or manually validate member paths to prevent directory traversal attacks. See PEP 706 for details.

Suggested change
with tarfile.open(path_to_zip_file) as tar_ref:
with tarfile.open(path_to_zip_file) as tar_ref:
# Secure extraction to prevent path traversal
for member in tar_ref.getmembers():
member_path = os.path.join(extract_location, member.name)
abs_extract_location = os.path.abspath(extract_location)
abs_member_path = os.path.abspath(member_path)
if not abs_member_path.startswith(abs_extract_location + os.sep):
raise Exception(f"Attempted Path Traversal in Tar File: {member.name}")

Copilot uses AI. Check for mistakes.


def move_and_rename(source_location: str) -> str:
new_location = os.path.join(os.path.dirname(source_location), "unzipped")
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The string literal "unzipped" is repeated 11 times across the file. Consider defining it as a module-level constant (e.g., UNZIPPED_DIR_NAME = "unzipped") to improve maintainability and reduce the risk of typos when the directory name needs to be changed.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agree with this.

"$(TargetingString)"
--service="${{ parameters.ServiceDirectory }}"
--toxenv=sphinx
--checks="sphinx"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We gotta add the env: for TOX_PIP_IMPL: 'uv' So that this is screaming fast.

site_folder = os.path.join(package_dir, "website")
doc_folder = os.path.join(staging_directory, "docgen")

if should_build_docs(package_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move this should_build_docs to a precheck above this entire section of code.

We check multiple places whether or not we should build docs, and if we shouldn't we keep having to check this. I recognize you probably pulled this from previous code state, but if we're here we might as well improve it.

logger.info("Skipping sphinx prep for {}".format(package_name))

# run apidoc
output_dir = os.path.join(staging_directory, "unzipped/docgen")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that we run mgmt_apidoc against the output_dir, but sphinx_apidoc against the staging_directory?

What's the key difference?

Copy link
Member

@scbedd scbedd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to have the full comments that copilot is adding, but I do appreciate the doc-comments that it's suggesting.

Feel free to maintain the simpler :param: documentation over the gigantic suggestion that it's offering. See the rest of the repo for examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants