You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TensorFlow is a very successful open source project. Since it has been open sourced, [1800+ contributors](https://github.com/tensorflow/tensorflow) have submitted code into TF from outside Google. However, as more and more developers contribute, it becomes more and more difficult to manage contributions in the single repository.
13
13
14
-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
14
+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs**. Thanks to the module APIs, these modules are now **managed/owned/released independently.**
15
15
16
16
### Problems addressed
17
17
@@ -55,20 +55,30 @@ Having a monolithic repository means we need to rebuild all of our code for all
55
55
56
56
## Overview
57
57
58
-
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that guarantee backwards compatibility. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
58
+
This project aims to split the TensorFlow codebase into **smaller, more focused**, repositories that can be released and managed separately. These modules will talk to each other using **well defined APIs** that will evolve over time. Thanks to these APIs, these modules will be **managed/owned/released independently**. There will be different strategies to break apart pieces based on the languages, but below summarizes the approach for C++ and Python:
59
59
60
60
61
61

62
62
63
63
A summary of the above is:
64
64
65
-
66
-
67
65
* Core TF functionality will be implemented in C++
68
66
* Core TF functionality can be extended using shared objects.
69
67
* On top of the core C++ libraries, we will have the language bindings (Using the C API)
70
68
* There can be more functionality built on top of the core TF bindings in different languages, which can be maintained and distributed separately.
71
-
* All different pieces need to use Stable public APIs with backwards compatibility guarantees.
69
+
* All different pieces need to use well defined public APIs.
70
+
71
+
A few important points to clarify above are:
72
+
73
+
* We will try our best to make sure the APIs will stay as close as possible to
74
+
the current APIs.
75
+
* We are aiming to avoid needing to change most existing custom op and kernel
76
+
code.
77
+
* The APIs will evolve over time. We will modify the APIs based on our and
78
+
user's needs. These modifications are expected to follow versioning guidelines
@@ -90,7 +100,7 @@ This project aims to implement similar plugin architectures for multiple compone
90
100
91
101
1. Networking module, with verbs, gdr plugins initially
92
102
1. Filesystems module, with GCP, AWS and HDFS support
93
-
1. Kernels module,
103
+
1. Kernels module,
94
104
1. Optimizers/Graph rewrite module,
95
105
1. Accelerator backends module
96
106
@@ -285,24 +295,14 @@ This section will describe the key design points for modular Python packages for
285
295
286
296
Contains the base Python API, and "Core TF" C++ shared objects
287
297
288
-
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages. This API is required to have backwards compatibility guarantees for minor version changes. With this guarantee, we expect the following:
289
-
290
-
291
-
_"Given that the combination of these packages work: TF-base 1.n, and addon package 1.m work together, TF-base 1.(n+k) and add on package 1.m should always work together."_
292
-
293
-
If we discover a violation of this guarantee, that will be treated as a P1 bug, and it will require a patch release for the base package 1.(n+k)
294
-
298
+
This package will be a subset of the current "tensorflow" pip package. It will include all of the core TF API except the high level API modules we will split up. It will define a public API for everything except for the required add on packages.
295
299
296
300
### Required tensorflow addons
297
301
298
-
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring.
299
-
300
-
These packages have two constraints:
301
-
302
+
These packages are planned to contain high level TF functionality that can be safely split up from TF. Examples for these are tensorboard, estimator and keras. Together with the base TF package, these packages will contain the full Python code of TF, except for top level API wiring. As like any addons, these are only allowed to use public APIs exposed by their dependencies. These packages have two constraints
302
303
303
-
304
-
1. They are only allowed to use public APIs exposed by their dependencies.
305
-
1. They are required to provide backwards compatible public APIs.
304
+
1. They are only allowed to use public APIs exposed by their dependencies.
305
+
1. They are required to provide backwards compatible public APIs.
306
306
307
307
With the backwards compatible public APIs, we expect addons to be able to release independently as long as features they depend on are released in their dependencies.
308
308
@@ -342,19 +342,7 @@ TENSORFLOW_DEPENDENCIES= [
342
342
343
343
### TF Public APIs
344
344
345
-
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **public API with backwards compatibility guarantees**. What this means is, no API symbols in the public API cannot be changed in a backwards incompatible way, syntactically or semantically, between any minor versions. Below is a toy example of two packages explaining the guarantees we expect:
346
-
347
-
348
-

349
-
350
-
351
-
352
-
* P1 depends on P2
353
-
* P2 is expected to provide a public API
354
-
* All API symbols exposed by P2 version M.N is expected to work at version M.(N+K) for any non-negative integer K.
355
-
* P2 is allowed to make breaking changes to its API between major releases (M to M+1)
356
-
* If P1 version X.Y works with P2 version M.N, it should also work the same way with P2 version M.(N+K) However, there are no guarantees for it to work with P2 version (M+K).L
357
-
* When P1 is releasing a new version, it should check which API symbols it needs from P2, and fix the minimum version requirement in its pip package for P2 accordingly.
345
+
As a part of the modularization, to be able to decouple development and releases for each of these packages, each package is required to expose a **well defined, well documented public API**.
358
346
359
347
360
348
### Optional TF packages
@@ -363,21 +351,17 @@ Mostly expected to contain the C++ plugins defined in the previous section. Thes
363
351
364
352
These shared objects will be automatically loaded by TF core if:
365
353
366
-
367
-
368
354
* They correctly define the compatibility strings using `TF_PLATFORM_STRINGS`
369
355
* They are compatible with the system tf core is running on
370
356
* They have been properly built and signed (unless running in developer mode)
371
357
372
358
373
359
## Alternatives / Potential Issues
374
360
375
-
376
-
377
361
***Why do we not use C++ APIs instead of C**: Compilers have no guarantees for ABIs generated for C++ code. Any C++ API used will require each shared object to be compiled with the same compiler, using the same version of the compiler, with the same compiler flags ([See github issue 23561](https://github.com/tensorflow/tensorflow/issues/23561)).
378
362
***Why do not we statically link everything**: Single shared object for everything: Anywhere except google does not have access to the massively parallel build system we use here at google. This causes prohibitive build times, causing major developer pain for open source developers. There are many more issues, but the summary is while this is a great solution for google, outside google this is simply infeasible.
379
363
***TF will become a suite of multiple packages, built by multiple authorities. What if the bugs get blamed on TF team**: With the modular model, we expect testing of 3rd party code to become easier. This can also be mitigated if the error messages are better, and if they can clearly point out which module the issue stems from. Finally, we can create an apple-swift like testing model, where we run a Jenkins setup that people can donate their machines to, and we can run continuous integration tests on their plugins.
380
-
***Why not have APIs but still have a monolithic repository: **When everything is in a single repository, this enables developers to bypass the APIs, and depend on internals. Moreover, we cannot grant full control over different folders on our repository to our partners in a single repository. As long as they are in a single repository, they are still constrained by our build system and license. Finally, in a single repository we do not provide the option of closed source plugins for contributors.
364
+
***Why not have APIs but still have a monolithic repository**When everything is in a single repository, this enables developers to bypass the APIs, and depend on internals. Moreover, we cannot grant full control over different folders on our repository to our partners in a single repository. As long as they are in a single repository, they are still constrained by our build system and license. Finally, in a single repository we do not provide the option of closed source plugins for contributors.
381
365
***Why not go with the OSS federation solutions?** OSS federation requires all dependencies to be in the federation before adding a repository. This is simply not possible for tensorflow, as eigen, llvm and many other dependencies will never be a part of the federation.
382
366
***Documentation, how/where do we document everything?** With multiple repositories, structure of the documentation will need to be rethought, based on what is a part of "TensorFlow proper" and what is an optional feature.
383
367
@@ -399,7 +383,7 @@ We propose the following principles to be followed for testing in a modular worl
399
383
In the current setup, we need to test all of the above packages for different Python versions, operating systems, accelerators (CPU, GPU), compilers, and more variants combined. In the modularized world, each of these packages only need to be unit tested for the following:
400
384
401
385
402
-
* tensorflow-base: Operating systems, compiler versions and python versions only with CPU
386
+
* tensorflow-base: Operating systems, compiler versions and python versions only with CPU
403
387
* tf-gpu: With GPU only, for different operating systems.
404
388
* tf-estimator: Only for different python versions
405
389
@@ -439,7 +423,7 @@ To summarize the above timeline:
439
423
440
424
* Different packages set their own release cadences
441
425
* Each package will set version boundaries for each of their dependencies.
442
-
* Each package is responsible for ensuring that all of their public APIs are working without any changes until the next major release
426
+
* Each package is responsible for ensuring that all of their public APIs are working as promised.
443
427
* Packages do not need to modify the minimum version requirements unless they start using newly introduced public API symbols.
444
428
* TF metapackage releases may choose to hold back individual packages in favor of faster releases. But dependency requirements have to be respected when doing so.
0 commit comments