Description
Note, Feb 2 2022: The current proposal is in #50102 (comment).
Abstract
archive/tar function FileInfoHeader does uid -> uname and gid -> name lookups,
which are not always necessary and can sometimes be problematic. A new function,
FileInfoHeaderNoNames, is proposed to address these issues.
Background
Change https://go-review.googlesource.com/59531
(which made its way to Go 1.10) implemented
user/group name lookups in tar/archive's FileInfoHeader.
It fills in tar file info header fields Uname and Gname,
looking up user and group names (from Uid and Gid)
via os/user.LookupId and LookupGroupId functions.
Doing that is not always desirable, and is sometimes problematic:
-
In a chrooted environment, /etc/passwd and /etc/group may be
absent, or their contents may be entirely different from that of the host. -
Failed name lookups are not currently cached, which may result in a
considerable performance regression, caused by re-parsing of
/etc/passwd and /etc/group for every file entry added to the tar. -
In case of static linking against glibc, the latter wants to dlopen
some libraries that might either be unavailable (which results in
a panic/crash) or (in case of untrusted chroot) a malicious library
can be substituted by a bad actor. -
There may be a need to create a tarball without any user/group names
(only with numeric uids/gids), akin to GNU tar's--numeric-owner
option. -
There may be a need to use custom uid -> name and gid -> name
lookup functions.
Now, problem 2 can be mitigated by using (indirectly, via os/user Lookup{,Group}Id)
a good C library that does caching, or by caching failed lookups as well.
Problem 3 can be solved by using osusergo
build tag, but it's compile unit wide,
meaning it will also affect other os/user uses, not just archive/tar.
Yet it seems impossible to solve both 2 and 3 at the same time.
As far as I know, there are no easy solutions for problems 1 and 5.
In particular, this affects Docker, which performs image unpacking by re-executing
the main binary (dockerd) in the container context (essentially a chroot). As a workaround,
Docker maintains a fork of archive/tar with commit 0564e30 partially reverted
(see moby/moby#42402).
Proposal
Add a function similar to FileInfoHeader
, which does not perform any id -> name lookups,
leaving it to a user. The proposed name is FileInfoHeaderNoNames
(can also be *NoLookup
,
*Num
, etc).
Rationale
Adding a new function seems to be the most simple and elegant approach, with very little code to add, and yet solving all the issues raised above.
Alternatives are:
- (for users) to maintain the fork of archive/tar
- (for archive/tar) to implement a build tag which disable lookups (e.g.
archivetarnolookups
orarchivetarnumeric
) - (for archive/tar) to add a way to provide own name lookup function
- (for archive/tar) to add another way to disable lookups (so a user can do
tar.NameLookup = false
ortar.NameLookup(false)
- to unconditionally remove id -> name lookups from archive/tar (might bring compatibility issues)
Compatibility
Since this is a new API, and the existing functionality of FileInfoHeader is left intact,
there are no compatibility issues.
Implementation
See https://go-review.googlesource.com/c/go/+/371054 for the example code.