Skip to content

Conversation

@nnobelis
Copy link
Member

Please have a look at the individual commit messages for the details.

Example report (with how-to-fix hints):
AsciiDoc_fossid_snippets.zip

@nnobelis nnobelis requested a review from a team as a code owner June 14, 2023 07:26
@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (a4c2bd7) 64.27% compared to head (9559abf) 64.27%.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #7127   +/-   ##
=========================================
  Coverage     64.27%   64.27%           
  Complexity     1970     1970           
=========================================
  Files           333      333           
  Lines         16667    16667           
  Branches       2389     2389           
=========================================
  Hits          10713    10713           
  Misses         4912     4912           
  Partials       1042     1042           
Flag Coverage Δ
funTest-docker 69.24% <ø> (ø)
funTest-non-docker 28.54% <ø> (ø)
test 40.05% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch from e3bc159 to c377a4c Compare June 14, 2023 07:48
@nnobelis nnobelis marked this pull request as draft June 14, 2023 07:55
@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch from c377a4c to 9c4569b Compare June 14, 2023 08:08
@nnobelis nnobelis marked this pull request as ready for review June 14, 2023 08:10
* not support support VCS path, they are most likely duplicates of other results.
*/
@Suppress("UNUSED") // This function is used in the templates.
fun filterResultsByVCS(scanResults: Map<Identifier, List<ScanResult>>): Map<Identifier, List<ScanResult>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this filtering already happen in the FossID scanner implementation, so such results do not show up in the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I will try to better explain: the project I am scanning contains two packages with the same provenance, differing in the VCS path.
The current scanner logic in ORT sees that these packages have the same provenance, and triggers a scan for one of them in FossID ("Which one" could be influenced thanks to #7092).

Then, after the scan, there is a scanner logic that de duplicates the scan results for packages with the same provenance. Therefore the scan results contains the FossID results two times, one for each package.

Therefore there is no filtering to be done in FossID : the scanner scans already only one package.
I think the whole duplication behavior will be fixed when FossID will scan by provenance and that scan results are stored by provenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a scanner logic that de duplicates the scan results for packages with the same provenance

Did you mean "duplicates" instead of "de duplicates"? (I believe duplicating is what it actually does)

Therefore there is no filtering to be done in FossID : the scanner scans already only one package.

I don't understand how the amount of packages being scanned is related to filtering findings by path.
To me these topics seem unrelated, maybe there is a mis-understanding here.

I believe it'd be worth looking at the functions ScanSummary.filterByPath() and ScanSummary.filterByIgnorePatterns(). These do not adhere to snippet findings yet. I believe they need to be extended, which IIUC would address @sschuberth 's ask.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean "duplicates" instead of "de duplicates"? (I believe duplicating is what it actually does)

Yes duplicate, sorry.

I don't understand how the amount of packages being scanned is related to filtering findings by path. To me these topics seem unrelated, maybe there is a mis-understanding here.

As I explained, the scan result of FossID are duplicated in the scan-results of ORT, because I have two packages with the same VCS root.

  scan_results:
    'pkg1':
    - provenance:
        vcs_info:
          type: "Git"
          path: "apath"
      summary:
          [...]
     'pkg2':
    - provenance:
        vcs_info:
          type: "Git"
          path: ""
      summary:
          [...]

The two summaries are exactly the same, because they have been duplicated. Hence, I cannot just pass the scan_results to the reporter, otherwise I will see duplicated packages.
Therefore I need to filter the scan results to keep only one. I do it by VCS path, because I know FossID doesn't support it, but it could be any other criteria.

I believe it'd be worth looking at the functions ScanSummary.filterByPath() and ScanSummary.filterByIgnorePatterns(). These do not adhere to snippet findings yet. I believe they need to be extended, which IIUC would address @sschuberth 's ask.

Yes, but filterByPath filters the results to keep the ones under path. But it will not work here, because I have two packages with the same provenance.
Same for filterByIgnorePatterns I wouldn't know which ignore pattern to put, because once again these are two packages with the same provenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two summaries are exactly the same, because they have been duplicated. Hence, I cannot just pass the scan_results to the reporter, otherwise I will see duplicated packages.

I believe I still don't understand the case. Am I right that the ORT File / analyzer result contains two packages with identical provenance? ...and for both packages the same snippet is found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Analyzer result, there are two packages with nearly the same provenance: it only differs by vcs_info/path.

When the scanner is called, one package is chosen (=reference package) and scanned with FossID. The other package is NOT scanned. I believe this logic scans the reference package:

if (packagesWithIncompleteScanResult.size > 1) {
logger.info {
val packageIds = packagesWithIncompleteScanResult.drop(1)
.joinToString("\n") { "\t${it.id.toCoordinates()}" }
"Scanning package '${referencePackage.id.toCoordinates()}' as reference for these packages " +
"with the same provenance:\n$packageIds"
}

Then after the scan, the ScanSummary of the reference package is duplicated to the other package.
At the end, the scan-results contains the two packages having each identical summaries.

The reporter cannot be called on this as the two duplicated summaries are the same. Therefore, this function filters the packages in the scan-results to keep only one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporter cannot be called on this as the two duplicated summaries are the same. Therefore, this function filters the packages in the scan-results to keep only one.

Shouldn't each package have the snippet findings associated with it, no matter if there is another package which includes the same snippets? If both packages are built from same VCS, then adjusting these filterBy functions I mentioned earlier would take care of filtering out the snippets which are not relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't want to filter out individual snippet findings: I want to have in the report only ONE set of snippet findings.
Therefore, I need to do filtering at package level to keep only one package.

How is the webapp reporter dealing with duplicated license findings with packages having the same VCS ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the webapp reporter dealing with duplicated license findings with packages having the same VCS ?

AFAIK, the WebApp / static HTML report do report the licenses per package. If two packages are built from same VCS, they still report per package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following this discussion, I realized this change was controversial so I removed the related commit to not hinder the current PR. I will rework it and add it to a future separate PR.

@sschuberth
Copy link
Member

sschuberth commented Jun 14, 2023

Example report (with how-to-fix hints):

Ugh, this contains results for files like .git/hooks/applypatch-msg.sample. Should we filter out snippet matches for VCS paths before adding them to the results? (Really weird that FossID does not seem to ignore these by itself.)

@sschuberth
Copy link
Member

Example report (with how-to-fix hints):

Some more UI-things:

  • Would be nice if this was more ORT-branded, similar to like done here.
  • Instead of "Click here" to expand, can we say "Show How-to-Fix-Hint" or similar?
  • Should we display the score as a percentage so it's more clear what the maximum value is?

@sschuberth sschuberth requested a review from a team June 14, 2023 09:03
@nnobelis
Copy link
Member Author

Example report (with how-to-fix hints):

Ugh, this contains results for files like .git/hooks/applypatch-msg.sample. Should we filter out snippet matches for VCS paths before adding them to the results? (Really weird that FossID doe snot seem to ignore these by itself.)

Please don't be shocked by these results: when one triggers a scan with FossID, the scan gets automatically an ignore rule for .git. Here I generated the report for a scan made with latest version from FossID. It seems there is a (FossId) bug that the ignore rule is missing.

@nnobelis
Copy link
Member Author

nnobelis commented Jun 14, 2023

Some more UI-things:

* Would be nice if this was more ORT-branded, similar to like done [here](https://github.com/oss-review-toolkit/ort/pull/6290#issuecomment-1367564130).

Yes this would be nice but I noticed the work you mentioned has been done for the PdfTemplateReporter.kt and I am not sure it applies to the HTML reporter (cf. OPTION_PDF_THEME_FILE). Would it be ok to add it in a future commit ?

* Instead of "Click here" to expand, can we say "Show How-to-Fix-Hint" or similar?

Yes this is already the case in my development branch. This PR does not contain the how-to-fix block will be added later. The attached file was just an example to justify the need for HTML.

* Should we display the score as a percentage so it's more clear what the maximum value is?

Well I have not idea what the score unit is for FossID as it doesn't show it in the UI and it is undocumented.
And when I see in their documentation that they have for components such properties:
"score": 373.25446,
I would not make the assumption it's a percentage.

@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch 4 times, most recently from 8fb435c to ff57fa2 Compare June 14, 2023 11:21
@nnobelis nnobelis requested review from fviernau and sschuberth June 14, 2023 12:27
@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch from ff57fa2 to d1f2a26 Compare June 14, 2023 12:38
@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch 2 times, most recently from 7ae3d20 to 396f741 Compare June 19, 2023 12:58
@nnobelis nnobelis requested a review from mnonnenmacher June 19, 2023 13:03
@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch from 396f741 to d41b2a6 Compare June 20, 2023 06:10
@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch from d41b2a6 to 28525cd Compare June 20, 2023 07:30
@nnobelis nnobelis requested a review from sschuberth June 20, 2023 07:31
@nnobelis nnobelis force-pushed the nnobelis/snippet_report2 branch from 28525cd to dd39637 Compare June 20, 2023 08:08
Copy link
Member

@sschuberth sschuberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, but I'd prefer @MarcelBochtler or @mnonnenmacher to approve as they have more insights about the use of the report.

nnobelis added 2 commits June 20, 2023 16:57
This is an HTML report to present the snippet findings.
HTML format was chosen to be able to use AsciiDoc collapsible blocks [1]
for how-to-fix hints. Such blocks will be added in a future commit.

[1]: https://docs.asciidoctor.org/asciidoc/latest/blocks/collapsible/

Signed-off-by: Nicolas Nobelis <[email protected]>
@mnonnenmacher mnonnenmacher force-pushed the nnobelis/snippet_report2 branch from dd39637 to 9559abf Compare June 20, 2023 14:57
@mnonnenmacher
Copy link
Member

@nnobelis I have rebase the PR to get rid of the failing tests which were fixed in another PR.

@MarcelBochtler MarcelBochtler merged commit 5ef2a0d into oss-review-toolkit:main Jun 21, 2023
@MarcelBochtler MarcelBochtler deleted the nnobelis/snippet_report2 branch June 21, 2023 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants