-
Couldn't load subscription status.
- Fork 355
Exclude binary license files to prevent reporter hang #10109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude binary license files to prevent reporter hang #10109
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #10109 +/- ##
============================================
+ Coverage 56.65% 56.66% +0.01%
- Complexity 1615 1619 +4
============================================
Files 332 332
Lines 12277 12281 +4
Branches 1138 1139 +1
============================================
+ Hits 6955 6959 +4
Misses 4877 4877
Partials 445 445
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as resolved.
This comment was marked as resolved.
d1a62bf to
ad3e8b6
Compare
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
To allow having a feature toggle for the filtering, for the reasons I outlined above. And also if there is a bug in the |
This comment was marked as resolved.
This comment was marked as resolved.
But why implement a feature toggle to begin with? Why ever maintaining binary files as part of the license file archive? |
|
I would also prefer to not include license files in the archives at all, even if there is a certain risk that the detection might fail in some cases. That's also missing in this PR, IIUC it currently only throws an exception if an archive already contains a binary file which I guess means that the report will not be created at all. But there is no way to recover from that, because even when the archive file is deleted it will be recreated with the same input an contain a binary file again. |
This comment was marked as resolved.
This comment was marked as resolved.
ad3e8b6 to
36d89b8
Compare
|
Outcomes from today's core dev meeting:
|
36d89b8 to
8c7ac6b
Compare
a064c99 to
735bbb0
Compare
735bbb0 to
5488b2e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
|
My points have been addressed, but I would leave the approval to the guys who have been deeper involved in this topic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm basically ok with introducing Tika as a dependency now, as I've checked that the library is not huge, and we mostly already use its transitive dependencies elsewhere.
5488b2e to
59cb538
Compare
a94cd09 to
03173e5
Compare
03173e5 to
a9b856c
Compare
a9b856c to
58b3d3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two final nits.
Detect and exclude binary license files using Apache Tika. When a non-text file is found during the license info creation process, a warning is printed, and it is excluded from the final report. This prevents the inclusion of binary files that previously caused the reporter to enter an endless loop during report generation. Signed-off-by: Julian Olderdissen <[email protected]>
58b3d3f to
bbcceb9
Compare
| "include utf8 file with japanese chars" { | ||
| createFile("License") { writeText("ぁあぃいぅうぇえぉおかが") } | ||
|
|
||
| val archiver = FileArchiver.createDefault() | ||
| archiver.archive(workingDir, PROVENANCE) | ||
| val result = archiver.unarchive(targetDir, PROVENANCE) | ||
|
|
||
| result shouldBe true | ||
| targetDir should containFile("License") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this does not work as expected. Can you have a look at that thread in Slack, @Juli0q?
Binary license files are now detected and excluded during license info resolution. In previous cases, such files caused the reporter to enter an endless loop during report generation, as the archiver attempted to include the binary content in the report.
To verify this behavior, a test file named LICENSE-BIN is added to archive.zip. It contains 4 arbitrary bytes created with a hex editor to simulate a non-text license file.
Apache Tika is introduced to detect MIME types and distinguish between text and non-text files. This ensures that only valid, readable license files are processed and included in the final report.