updater: abstract out the network IO

This might be relevant to Updater redesign (#1135) and if accepted would deprecate #1142 and the PR #1171  

We (me, Joshua, Martin, Teodora) have been talking about abstracting some of the client functionality out of the Updater itself. The biggest issue from my perspective is network IO. Teodora already made a PR to let the application download targets but it seems like there are still issues with TUF handling metadata downloads.

## Why is this needed?

* In the real world applications are already using a network stack and will be using it after integrating TUF as well: we should not force another one on them
* Even if the network stacks of the application and TUF are same, the fact that they use different sessions and configurations is not great
* Complex applications have legitimate needs to configure a lot of things we don't want to provide API for: user agent, proxies, basic authentication, custom request headers. This applies to both metadata and targets
* Complex applications have legitimate needs to control the download process (e.g. progress information, canceling)
* Complex applications have (legitimate?) needs to poke at low level details like timeouts

## Potential solutions

We identified two main solutions to this:
1. Make a new event-based non-blocking client API. This would be most flexible but also more complex for TUF maintainers to maintain and application developers to customize 
1. Keep the current API but add a new Fetcher interface that applications can optionally implement. This is likely fairly easy and non-invasive to implement but remains a blocking API

I'm proposing option 2 but for reference please see the [draft](https://gist.github.com/jku/e100bc2a676867e56c8e830a92bed751) of option 1 as well.

### Proposal

Add a Fetcher interface that applications can implement. Provide a default implementation of Fetcher. Add a new method to Updater that Fetcher can use to provide the data it fetches. 

The Updater processes (`refresh()`, `get_one_valid_targetinfo() `and `download_target()`) will now look like this:
* Whenever a remote file (metadata or target) is needed:
  * setup a temporary file to write results to
  * call `Fetcher.fetch()`
    * fetcher calls `Updater.provide_fetched_data()` zero or more times to provide chunks of data. Updater writes these chunks into the file
  * when fetcher returns without exceptions, the download is finished and written to the file

This is like the go-tuf [RemoteStore](https://github.com/theupdateframework/go-tuf/blob/master/client/client.go#L36) abstraction with two differences: 1. Python does not have reasonable stream abstractions like io.ReadCloser (that would actually be implemented by any of the network stacks) so we cannot return something like that: instead our implementation blocks and adds a `provide_fetched_data()` callback into Updater instead. 2. Metadata and target fetching is not separated: this way the Fetcher does not need any understanding of TUF or server structure, it's just a dumb downloader.
 
```python
# Only new/changed methods mentioned for Updater
class Updater(object):
    # init now accepts an optional fetcher argument
    def __init__(self, repository_name, repository_mirrors, fetcher: Fetcher = None):

    # Accepts content of the url that is being currently fetched.
    # Can be called only from Fetcher.fetch() that this Updater called.
    def provide_fetched_data(self, data: bytes)

# New interface for applications to implement
class Fetcher(metaclass=abc.ABCMeta):
    # Fetches the contents of HTTP/HTTPS url from a remote server. Calls 
    # self.updater.provide_fetched_data() to forward sequential chunks of
    # bytes to the updater. Returns when the download is complete and all
    # bytes have been fed to updater.
    @abc.abstractmethod
    def fetch(self, url: str, length: int):
        pass

    # Called by updater init
    def set_updater(self, updater: Updater):
        self.updater = updater
```

I think this is fairly straight-forward to implement even without a client redesign (and will be backwards-compatible). download.py is split into two parts: one part contains the Tempfile handling bits and _check_downloaded_length() and are used by the updater itself; the rest of download.py form the default Fetcher implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

updater: abstract out the network IO #1213

Why is this needed?

Potential solutions

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

updater: abstract out the network IO #1213

Description

Why is this needed?

Potential solutions

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions