Skip to content

Conversation

@kiendang
Copy link
Contributor

@kiendang kiendang commented Mar 7, 2025

Vendored code

Follow the discussion in #356, this adds support for permissions and symlinks in os.zip/unzip with vendored zip source code from Apache Ant.

The vendored source code is generated by the os.zip.apacheAntZipSource task and put in os/zip. It's shaded with the package renamed from org.apache.tools.zip to os.shaded_org_apache_tools_zip.

scala-steward.conf was added and configured to run os.zip.apacheAntZipSource on org.apache.ant:ant updates.

Features

This brings support for permissions and symlinks to zip (for creating new zips, not modifying existing ones), zip.stream and unzip. As for modifying existing zips, we would still have to rely on jdk.zipfs which does not support symlinks.

file permissions symlinks
os.zip.open if Java Runtime Version >= 14
os.zip (create new)
os.zip (modify existing) if Java Runtime Version >= 14
os.zip.stream
os.unzip
os.unzip.stream

TODO

  • (Advice needed) make sure we comply with Apache Ant's license to include the code here. Would appreciate opinions on this as I'm not an expert.
  • (Advice needed) make ZipOps JVM only
  • tests
  • make sure things don't break on Windows
  • add permission support to modifying existing zips with jdk.zipfs like what @sake92 did in Preserve zip files permissions #371

@lihaoyi
Copy link
Member

lihaoyi commented Mar 7, 2025

At a first glance, how about we change the package name to os.shaded_org_apache_tool_zip and keep the class names unchanged. That would reduce the amount of diff we would have to apply to the source files and keep them closer to their upstream equivalents

build.mill Outdated
}
}

trait ApacheAntZipVendor extends Module {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be a normal object rather than a trait.

It also deserves a comment summarizing the reasons:

  • Why we chose to include a third party implementation (because the JDK version doesn't support symlinks)
  • Why we chose the Ant implementation over apache-commons (to avoid a dependency on apache-commons-io, and anyway it's where the commons-io implementation came from)
  • Why we are shading it (to ensure it doesn't cause classpath conflicts)
  • Why we are building it from source (because we only want a subset of the codebase, not the entire thing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also compile the shaded apache zip stuff in a os.zip submodule in os/zip/src/ separate from the main OS-Lib codebase in os/src/. It's generally best to avoid mixing generated and hand-written code.

@lihaoyi
Copy link
Member

lihaoyi commented Mar 7, 2025

Needs some small unit tests exercising the permissions/symlink preservation functionality, and a manual test on the cosmocc zip to reproduce the problems of the old implementation and verify that they are fixed with the new one

@kiendang
Copy link
Contributor Author

kiendang commented Mar 7, 2025

At a first glance, how about we change the package name to os.shaded_org_apache_tool_zip and keep the class names unchanged. That would reduce the amount of diff we would have to apply to the source files and keep them closer to their upstream equivalents

I would like that too, but Java package private access modifiers are not as flexible as in Scala. You can't have package os.shaded_org_apache_tool_zip and make it private[os], only private[os.shaded_org_apache_tool_zip] (no modifier in Java).

How about move ZipOps.scala to a trait, then move both the trait and the vendored code to a os.zip package, make the vendored code private[os.zip] and ZipOps trait private[os] and mix in the ZipOps trait with os package object? We probably need to move ZipOps.scala to a trait anyway to make it JVM only.

@lihaoyi
Copy link
Member

lihaoyi commented Mar 7, 2025

I think not privatizing it is fine. We can just document it as unstable/internal and turn of MIMA enforcement for it

@lihaoyi
Copy link
Member

lihaoyi commented Mar 7, 2025

We could also put a shim Scala file in os.shaded_org_apache_tool_zip that has access to the rest of the Java code, and give the Scala file private[os] so it can then be accessed by the rest of os-lib

@kiendang
Copy link
Contributor Author

We could also put a shim Scala file in os.shaded_org_apache_tool_zip that has access to the rest of the Java code, and give the Scala file private[os] so it can then be accessed by the rest of os-lib

I tried it (05ee56b) but it doesn't work. Looks like we can't bypass Java access modifier that way.

<error class ZipOutputStream cannot be accessed as a member of (os.shaded_org_apache_tools_zip : shaded_org_apache_tools_zip) from module class zip$.>

Got to either keep it public or put os.zip under the same package with the vendored code. Currently doing the former.

@kiendang kiendang force-pushed the ant branch 4 times, most recently from 35a2238 to c079d8b Compare March 14, 2025 02:24
@kiendang
Copy link
Contributor Author

Just found out you can make symlinks on windows. Wanna take a look at this first before finalizing.

@kiendang
Copy link
Contributor Author

kiendang commented Mar 28, 2025

@lihaoyi Need some help on deciding how zipping/unzipping symlinks would work on Windows here. You can create symlink on Windows, but only if (1) run as Administrator (by default) and (2) on NTFS.

Currently in this PR on Windows a symlink is

  • always zipped as the referenced files (followLinks = false has no effect)
  • unzipped as is, i.e. as a normal file with the name of the referenced file as its content, so a broken symlink.

This is similar to how Powershell's own Compress-Archive and Expand-Archive work.

Also tried Info-ZIP unzip, or more precisely the Cosmo implementation of it. There symlinks are

  • zipped as symlinks or as the referenced files, i.e. following followLinks
  • always unzipped as symlinks. If the symlink cannot be created an error is thrown and the file is not unzipped at all.

One option is to have the followLinks option in unzip as well and respect both zip and unzip followLinks on Windows.

@lihaoyi
Copy link
Member

lihaoyi commented Mar 28, 2025

I think having followLinks sounds reasonable. We could fail with a nice error message if it is set to true on windows, and ask the user to set it to false?

@kiendang
Copy link
Contributor Author

Went on to implement followLinks for unzip and it turned out to be more complicated to do right than I thought. Simple way is to unzipping all the regular files and directories first, then for symlinks just copy the unzipped targets. However, this gets complicated when there are symlinks of symlinks, mutually referencing symlinks, self-referencing symlinks, ... We should be able to handle this with a loop/recursion, but would be more complex.

For now on Windows I just try unzipping symlinks as symlinks. If that doesn't work, unzip it as a file containing the target path (like what Expand-Archive did), and print out a message.

@kiendang kiendang marked this pull request as ready for review March 28, 2025 09:23
@lihaoyi
Copy link
Member

lihaoyi commented Apr 10, 2025

Sorry for not getting to this @kiendang, will take a look now

build.mill Outdated
* To avoid classpath conflicts, the dependency is shaded and compiled from source. Only the `org.apache.tools.zip`
* package, not the entire Ant codebase, is needed. This only adds < 100kb to Os-Lib jar size.
*/
def apacheAntZipSource: T[PathRef] = Task {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a def generatedSources folder? That way we won't won't need to commit all the vendored code to the repo, and can rely on the build tool re-generating it on demand as necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

): os.Path = {
stream(os.read.stream(source), dest, excludePatterns, includePatterns)
checker.value.onWrite(dest)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remind me of the reason why we are unable to support permissions and symlinks in unzip.stream? I see that the upstream library has a version of ZipOutputStream in addition to ZipFile. Could we use that to get support for permissions and symlinks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried it and it doesn't work. It could be due to information on permissions and symlinks are stored as external attributes which are stored inside the zip central directory at the end of the zip file, not together with each zip entry. The Apache Commons Compress doc mentions this (in the ZipArchiveInputStream vs ZipFile section).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZipFile is able to read the central directory first and provide correct and complete information on any ZIP archive.

Copy link
Member

@lihaoyi lihaoyi Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that's fine but we just ned to document this limitation somewhere. Maybe on the scaladoc for the unzip.stream class? We can describe the limitation and link to the upstream docs

@kiendang kiendang force-pushed the ant branch 3 times, most recently from 68795dd to eacd945 Compare April 12, 2025 11:26
@kiendang
Copy link
Contributor Author

@lihaoyi Done.

I made an additional change to preserve permissions for zipped directories, not just files. When zipping, zip entries for directories (not just empty ones) are added. e.g. for a/b/c/d.txt, the following entries are added a/b/, a/b/c/ and a/b/c/d.txt. This is similar to the Unix zip command.

I made shaded code generatedSources. Since it's all Java code I made the os.zip a JavaModule.

Adding a followLinks option to os.unzip to support unzipping symlinks as the referenced files is actually not as complicated as I previously thought. Do you still want it?

During zipping, add zip entries for all directories (not just empty ones), with permissions set.
During unzipping, unzipping enclosing directories before their contents.
for adding to existing zips
@lihaoyi
Copy link
Member

lihaoyi commented Apr 18, 2025

I think this looks good, thanks @kiendang! I will transfer the bounty using your previous bank details

@lihaoyi lihaoyi merged commit 57700dd into com-lihaoyi:main Apr 18, 2025
7 of 8 checks passed
@lefou lefou added this to the 0.11.5 milestone Apr 18, 2025
@kiendang kiendang deleted the ant branch April 21, 2025 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants