Skip to content

Include unlicensed files in scanner results #9435

@kikofernandez

Description

@kikofernandez

What is the existing functionality and how should it be enhanced?

Currently, the scanner does not include files without licenses.

Problems

  • those files may not be detected by the developer (unless one knows that this is the default behaviour)
  • developers cannot apply curation rules, because curation rules are not applied to undetected file licenses.

Improvement
To make sure that all files are considered in the scanning phase, and that those can potentially be curated, I propose:

  • add an option (include-unlicensed: boolean) to the scan phase to include in the object scan_results.summary.licenses all files that were found to NOT have a license. Since this happens when calling the scan phase, these results should be recorded in the scan-result.json file.
    • The benefit of doing this, is that ORT now would allow curation of files with license NONE (or whatever is the default for an unknown license)

What is the use-case for your enhancement?

Source SBOMs may need to include all files in a repo. At the moment, generation of SBOM includes also the files without license, but one cannot have the option to curate files that should have a specific license. By adding the flag include-unlicensed: true, the scanner includes unlicensed files in ORT scanning result and gives the possibility to developers to curate those files, if needed.

As an example, projects with a single license at the top can enable this to include all files with NONE license, and apply a curation to all files that should have MIT license.

curations:
    license_findings:
      - path: "**/*.exs"
        reason: "INCORRECT"
        comment: "Apply license to all unknown files"
        detected_license: "NONE"
        concluded_license: "MIT"

      - path: "**/*.ex"
        reason: "INCORRECT"
        comment: "Apply license to all unknown files"
        detected_license: "NONE"
        concluded_license: "MIT"

I believe this is a quite common case, examples include the Elixir programming language (https://github.com/elixir-lang/elixir), Gleam
(https://github.com/gleam-lang/gleam), Django Web Framework (this shows an example of a file without license, so no license applied AFAIK), Rails Web Framework (Rails) where each folder contains the expected license that applies

Alternatives you have considered

I have a script that parses ORT scanner for files with licenses and all files with SHA1. Takes the set difference and adds the missing files to the corresponding scanner field with license NONE. This works, but I am not sure how maintainable it is in the future. It means I need to run ORT analysis and scanner, then run a custom script, then run the evaluator to get some results and apply curations.

Additional context

--

Metadata

Metadata

Assignees

No one assigned

    Labels

    configurationAbout configuration topicsscannerAbout the scanner tool

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions