Skip to content

updater: abstract out the network IO #1213

@jku

Description

@jku

This might be relevant to Updater redesign (#1135) and if accepted would deprecate #1142 and the PR #1171

We (me, Joshua, Martin, Teodora) have been talking about abstracting some of the client functionality out of the Updater itself. The biggest issue from my perspective is network IO. Teodora already made a PR to let the application download targets but it seems like there are still issues with TUF handling metadata downloads.

Why is this needed?

  • In the real world applications are already using a network stack and will be using it after integrating TUF as well: we should not force another one on them
  • Even if the network stacks of the application and TUF are same, the fact that they use different sessions and configurations is not great
  • Complex applications have legitimate needs to configure a lot of things we don't want to provide API for: user agent, proxies, basic authentication, custom request headers. This applies to both metadata and targets
  • Complex applications have legitimate needs to control the download process (e.g. progress information, canceling)
  • Complex applications have (legitimate?) needs to poke at low level details like timeouts

Potential solutions

We identified two main solutions to this:

  1. Make a new event-based non-blocking client API. This would be most flexible but also more complex for TUF maintainers to maintain and application developers to customize
  2. Keep the current API but add a new Fetcher interface that applications can optionally implement. This is likely fairly easy and non-invasive to implement but remains a blocking API

I'm proposing option 2 but for reference please see the draft of option 1 as well.

Proposal

Add a Fetcher interface that applications can implement. Provide a default implementation of Fetcher. Add a new method to Updater that Fetcher can use to provide the data it fetches.

The Updater processes (refresh(), get_one_valid_targetinfo() and download_target()) will now look like this:

  • Whenever a remote file (metadata or target) is needed:
    • setup a temporary file to write results to
    • call Fetcher.fetch()
      • fetcher calls Updater.provide_fetched_data() zero or more times to provide chunks of data. Updater writes these chunks into the file
    • when fetcher returns without exceptions, the download is finished and written to the file

This is like the go-tuf RemoteStore abstraction with two differences: 1. Python does not have reasonable stream abstractions like io.ReadCloser (that would actually be implemented by any of the network stacks) so we cannot return something like that: instead our implementation blocks and adds a provide_fetched_data() callback into Updater instead. 2. Metadata and target fetching is not separated: this way the Fetcher does not need any understanding of TUF or server structure, it's just a dumb downloader.

# Only new/changed methods mentioned for Updater
class Updater(object):
    # init now accepts an optional fetcher argument
    def __init__(self, repository_name, repository_mirrors, fetcher: Fetcher = None):

    # Accepts content of the url that is being currently fetched.
    # Can be called only from Fetcher.fetch() that this Updater called.
    def provide_fetched_data(self, data: bytes)

# New interface for applications to implement
class Fetcher(metaclass=abc.ABCMeta):
    # Fetches the contents of HTTP/HTTPS url from a remote server. Calls 
    # self.updater.provide_fetched_data() to forward sequential chunks of
    # bytes to the updater. Returns when the download is complete and all
    # bytes have been fed to updater.
    @abc.abstractmethod
    def fetch(self, url: str, length: int):
        pass

    # Called by updater init
    def set_updater(self, updater: Updater):
        self.updater = updater

I think this is fairly straight-forward to implement even without a client redesign (and will be backwards-compatible). download.py is split into two parts: one part contains the Tempfile handling bits and _check_downloaded_length() and are used by the updater itself; the rest of download.py form the default Fetcher implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    clientRelated to the client (updater) implementationenhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions