Skip to content

archive/tar: add FileInfoNames interface #50102

Closed
@kolyshkin

Description

@kolyshkin

Note, Feb 2 2022: The current proposal is in #50102 (comment).


Abstract

archive/tar function FileInfoHeader does uid -> uname and gid -> name lookups,
which are not always necessary and can sometimes be problematic. A new function,
FileInfoHeaderNoNames, is proposed to address these issues.

Background

Change https://go-review.googlesource.com/59531
(which made its way to Go 1.10) implemented
user/group name lookups in tar/archive's FileInfoHeader.
It fills in tar file info header fields Uname and Gname,
looking up user and group names (from Uid and Gid)
via os/user.LookupId and LookupGroupId functions.

Doing that is not always desirable, and is sometimes problematic:

  1. In a chrooted environment, /etc/passwd and /etc/group may be
    absent, or their contents may be entirely different from that of the host.

  2. Failed name lookups are not currently cached, which may result in a
    considerable performance regression, caused by re-parsing of
    /etc/passwd and /etc/group for every file entry added to the tar.

  3. In case of static linking against glibc, the latter wants to dlopen
    some libraries that might either be unavailable (which results in
    a panic/crash) or (in case of untrusted chroot) a malicious library
    can be substituted by a bad actor.

  4. There may be a need to create a tarball without any user/group names
    (only with numeric uids/gids), akin to GNU tar's --numeric-owner option.

  5. There may be a need to use custom uid -> name and gid -> name
    lookup functions.

Now, problem 2 can be mitigated by using (indirectly, via os/user Lookup{,Group}Id)
a good C library that does caching, or by caching failed lookups as well.
Problem 3 can be solved by using osusergo build tag, but it's compile unit wide,
meaning it will also affect other os/user uses, not just archive/tar.
Yet it seems impossible to solve both 2 and 3 at the same time.

As far as I know, there are no easy solutions for problems 1 and 5.

In particular, this affects Docker, which performs image unpacking by re-executing
the main binary (dockerd) in the container context (essentially a chroot). As a workaround,
Docker maintains a fork of archive/tar with commit 0564e30 partially reverted
(see moby/moby#42402).

Proposal

Add a function similar to FileInfoHeader, which does not perform any id -> name lookups,
leaving it to a user. The proposed name is FileInfoHeaderNoNames (can also be *NoLookup,
*Num, etc).

Rationale

Adding a new function seems to be the most simple and elegant approach, with very little code to add, and yet solving all the issues raised above.

Alternatives are:

  • (for users) to maintain the fork of archive/tar
  • (for archive/tar) to implement a build tag which disable lookups (e.g. archivetarnolookups or archivetarnumeric)
  • (for archive/tar) to add a way to provide own name lookup function
  • (for archive/tar) to add another way to disable lookups (so a user can do tar.NameLookup = false or tar.NameLookup(false)
  • to unconditionally remove id -> name lookups from archive/tar (might bring compatibility issues)

Compatibility

Since this is a new API, and the existing functionality of FileInfoHeader is left intact,
there are no compatibility issues.

Implementation

See https://go-review.googlesource.com/c/go/+/371054 for the example code.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions