Skip to content

Conversation

@chinyeungli
Copy link
Contributor

Issue: #1763

Create a pipeline for Maven package

@chinyeungli chinyeungli requested a review from tdruez November 13, 2025 10:40
Signed-off-by: Chin Yeung Li <[email protected]>
…1763

- Update package's license if missing while the same package has license detected in RESOURCES

Signed-off-by: Chin Yeung Li <[email protected]>
Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Create a new maven pipe module in place of use resolve
  • Opening and loading a large file to make edits multiple times in various steps is not great.
  • To be discussed: Do we need a dedicated pipeline for just an extra step? Shouldn't the original scan_single_package detect that it's a Maven package and apply the necessary? Any reason to keep this new logic separated?

Comment on lines 57 to 74
with open(self.scan_output_location) as file:
data = json.load(file)
# Return and do nothing if data has pom.xml
for file in data["files"]:
if "pom.xml" in file["path"]:
return
packages = data.get("packages", [])

pom_url_list = get_pom_url_list(self.project.input_sources[0], packages)
pom_file_list = download_pom_files(pom_url_list)
scanned_pom_packages, scanned_dependencies = scan_pom_files(pom_file_list)

updated_packages = packages + scanned_pom_packages
# Replace/Update the package and dependencies section
data["packages"] = updated_packages
data["dependencies"] = scanned_dependencies
with open(self.scan_output_location, "w") as file:
json.dump(data, file, indent=2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code logic should not be on the pipeline itself but in dedictated and easilly testable pipe functions

cls.extract_input_to_codebase_directory,
cls.extract_archives,
cls.run_scan,
cls.update_package_license_from_resource_if_missing,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have quite an impact on the default ScanSinglePackage results. We should probably handle this one separatly of the Maven context.

if not packages or not resources:
return

updated_packages = update_package_license_from_resource_if_missing(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use database queries instead of manipulating complex dictionaries.

return pom_file_list


def scan_pom_files(pom_file_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complex, it needs to be refactored as smaller functions

return scanned_pom_packages, scanned_pom_deps


def update_package_license_from_resource_if_missing(packages, resources):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be query-based.

- Create a new maven pipe module
- Use database queries for update_package_license_from_resource_if_missing()
- Add tests

Signed-off-by: Chin Yeung Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants