Skip to content

Commit 9de6d4c

Browse files
authored
Merge pull request #2 from gunan/revNplus1
Relax API requirements.
2 parents b42798f + 1d07d97 commit 9de6d4c

File tree

1 file changed

+25
-41
lines changed

1 file changed

+25
-41
lines changed

rfcs/20190305-modular-tensorflow.md

Lines changed: 25 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@
44
:-------------- |:---------------------------------------------------- |
55
| **Author(s)** | Gunhan Gulsoy ([email protected]) |
66
| **Sponsor** | Martin Wicke ([email protected]) |
7-
| **Updated** | 2019-03-06 |
7+
| **Updated** | 2019-11-25 |
88

99

1010
## Motivation
1111

1212
TensorFlow is a very successful open source project. Since it has been open sourced, [1800+ contributors](https://github.com/tensorflow/tensorflow) have submitted code into TF from outside Google. However, as more and more developers contribute, it becomes more and more difficult to manage contributions in the single repository.
1313

14-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
14+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs**. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
1515

1616
### Problems addressed
1717

@@ -55,20 +55,30 @@ Having a monolithic repository means we need to rebuild all of our code for all
5555

5656
## Overview
5757

58-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
58+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that will evolve over time. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
5959

6060

6161
![alt_text](20190305-modular-tensorflow/big_picture.png "Overview of modular TensorFlow")
6262

6363
A summary of the above is:
6464

65-
66-
6765
* Core TF functionality will be implemented in C++
6866
* Core TF functionality can be extended using shared objects.
6967
* On top of the core C++ libraries, we will have the language bindings (Using the C API)
7068
* There can be more functionality built on top of the core TF bindings in different languages, which can be maintained and distributed separately.
71-
* All different pieces need to use Stable public APIs with backwards compatibility guarantees.
69+
* All different pieces need to use well defined public APIs.
70+
71+
A few important points to clarify above are:
72+
73+
* We will try our best to make sure the APIs will stay as close as possible to
74+
the current APIs.
75+
* We are aiming to avoid needing to change most existing custom op and kernel
76+
code.
77+
* The APIs will evolve over time. We will modify the APIs based on our and
78+
user's needs. These modifications are expected to follow versioning guidelines
79+
[described
80+
here](https://github.com/tensorflow/community/blob/592221e839eb9629a9ff4c73d46ee44ccb832d97/rfcs/20190816-tf-project-versioning.md).
81+
7282

7383

7484
### Definitions
@@ -90,7 +100,7 @@ This project aims to implement similar plugin architectures for multiple compone
90100

91101
1. Networking module, with verbs, gdr plugins initially
92102
1. Filesystems module, with GCP, AWS and HDFS support
93-
1. Kernels module,
103+
1. Kernels module,
94104
1. Optimizers/Graph rewrite module,
95105
1. Accelerator backends module
96106

@@ -285,24 +295,14 @@ This section will describe the key design points for modular Python packages for
285295

286296
Contains the base Python API, and "Core TF" C++ shared objects
287297

288-
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages. This API is required to have backwards compatibility guarantees for minor version changes. With this guarantee, we expect the following:
289-
290-
291-
_"Given that the combination of these packages work: TF-base 1.n, and addon package 1.m work together, TF-base 1.(n+k) and add on package 1.m should always work together."_
292-
293-
If we discover a violation of this guarantee, that will be treated as a P1 bug, and it will require a patch release for the base package 1.(n+k)
294-
298+
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages.
295299

296300
### Required tensorflow addons
297301

298-
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring.
299-
300-
These packages have two constraints:
301-
302+
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring. As like any addons, these are only allowed to use public APIs exposed by their dependencies. These packages have two constraints
302303

303-
304-
1. They are only allowed to use public APIs exposed by their dependencies.
305-
1. They are required to provide backwards compatible public APIs.
304+
1. They are only allowed to use public APIs exposed by their dependencies.
305+
1. They are required to provide backwards compatible public APIs.
306306

307307
With the backwards compatible public APIs, we expect addons to be able to release independently as long as features they depend on are released in their dependencies.
308308

@@ -342,19 +342,7 @@ TENSORFLOW_DEPENDENCIES= [
342342

343343
### TF Public APIs
344344

345-
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **public API with backwards compatibility guarantees**. What this means is, no API symbols in the public API cannot be changed in a backwards incompatible way, syntactically or semantically, between any minor versions. Below is a toy example of two packages explaining the guarantees we expect:
346-
347-
348-
![alt_text](20190305-modular-tensorflow/simple_package_deps.png "Just two example packages.")
349-
350-
351-
352-
* P1 depends on P2
353-
* P2 is expected to provide a public API
354-
* All API symbols exposed by P2 version M.N is expected to work at version M.(N+K) for any non-negative integer K.
355-
* P2 is allowed to make breaking changes to its API between major releases (M to M+1)
356-
* If P1 version X.Y works with P2 version M.N, it should also work the same way with P2 version M.(N+K) However, there are no guarantees for it to work with P2 version (M+K).L
357-
* When P1 is releasing a new version, it should check which API symbols it needs from P2, and fix the minimum version requirement in its pip package for P2 accordingly.
345+
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **well defined, well documented public API**.
358346

359347

360348
### Optional TF packages
@@ -363,21 +351,17 @@ Mostly expected to contain the C++ plugins defined in the previous section. Thes
363351

364352
These shared objects will be automatically loaded by TF core if:
365353

366-
367-
368354
* They correctly define the compatibility strings using `TF_PLATFORM_STRINGS`
369355
* They are compatible with the system tf core is running on
370356
* They have been properly built and signed (unless running in developer mode)
371357

372358

373359
## Alternatives / Potential Issues
374360

375-
376-
377361
* **Why do we not use C++ APIs instead of C**: Compilers have no guarantees for ABIs generated for C++ code. Any C++ API used will require each shared object to be compiled with the same compiler, using the same version of the compiler, with the same compiler flags ([See github issue 23561](https://github.com/tensorflow/tensorflow/issues/23561)).
378362
* **Why do not we statically link everything**: Single shared object for everything: Anywhere except google does not have access to the massively parallel build system we use here at google. This causes prohibitive build times, causing major developer pain for open source developers. There are many more issues, but the summary is while this is a great solution for google, outside google this is simply infeasible.
379363
* **TF will become a suite of multiple packages, built by multiple authorities. What if the bugs get blamed on TF team**: With the modular model, we expect testing of 3rd party code to become easier. This can also be mitigated if the error messages are better, and if they can clearly point out which module the issue stems from. Finally, we can create an apple-swift like testing model, where we run a Jenkins setup that people can donate their machines to, and we can run continuous integration tests on their plugins.
380-
* **Why not have APIs but still have a monolithic repository: **When everything is in a single repository, this enables developers to bypass the APIs, and depend on internals. Moreover, we cannot grant full control over different folders on our repository to our partners in a single repository. As long as they are in a single repository, they are still constrained by our build system and license. Finally, in a single repository we do not provide the option of closed source plugins for contributors.
364+
* **Why not have APIs but still have a monolithic repository** When everything is in a single repository, this enables developers to bypass the APIs, and depend on internals. Moreover, we cannot grant full control over different folders on our repository to our partners in a single repository. As long as they are in a single repository, they are still constrained by our build system and license. Finally, in a single repository we do not provide the option of closed source plugins for contributors.
381365
* **Why not go with the OSS federation solutions?** OSS federation requires all dependencies to be in the federation before adding a repository. This is simply not possible for tensorflow, as eigen, llvm and many other dependencies will never be a part of the federation.
382366
* **Documentation, how/where do we document everything?** With multiple repositories, structure of the documentation will need to be rethought, based on what is a part of "TensorFlow proper" and what is an optional feature.
383367

@@ -399,7 +383,7 @@ We propose the following principles to be followed for testing in a modular worl
399383
In the current setup, we need to test all of the above packages for different Python versions, operating systems, accelerators (CPU, GPU), compilers, and more variants combined. In the modularized world, each of these packages only need to be unit tested for the following:
400384

401385

402-
* tensorflow-base: Operating systems, compiler versions and python versions only with CPU
386+
* tensorflow-base: Operating systems, compiler versions and python versions only with CPU
403387
* tf-gpu: With GPU only, for different operating systems.
404388
* tf-estimator: Only for different python versions
405389

@@ -439,7 +423,7 @@ To summarize the above timeline:
439423

440424
* Different packages set their own release cadences
441425
* Each package will set version boundaries for each of their dependencies.
442-
* Each package is responsible for ensuring that all of their public APIs are working without any changes until the next major release
426+
* Each package is responsible for ensuring that all of their public APIs are working as promised.
443427
* Packages do not need to modify the minimum version requirements unless they start using newly introduced public API symbols.
444428
* TF metapackage releases may choose to hold back individual packages in favor of faster releases. But dependency requirements have to be respected when doing so.
445429
* Major releases still need to be coordinated.

0 commit comments

Comments
 (0)