Custom ops in Addons don't work on tpu #1553

fsx950223 · 2020-04-03T07:37:48Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
TensorFlow version and how it was installed (source or binary):
TensorFlow-Addons version and how it was installed (source or binary):
Python version:
Is GPU used? (yes/no):

Describe the bug
Addons only register kernel on client device, TPU server will raise an error.
A clear and concise description of what the bug is.
NotFoundError: Op type not registered 'Addons>ImageProjectiveTransformV2' in binary running on n-cf493c1a-w-0. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
Code to reproduce the issue

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_addons as tfa
import os

TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
EPOCHS = 5


def create_model(input_shape):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(28, kernel_size=3, activation=tf.nn.relu, input_shape=input_shape))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
    model.add(tf.keras.layers.Dropout(0.2))
    model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
    return model


def _parser(example):
    image = tf.image.convert_image_dtype(example["image"], tf.float32)
    with tf.device('/device:cpu:0'):
      image = tfa.image.translate(image,[1,1])
    label = example["label"]
    return tf.cast(image,tf.bfloat16), label


def get_tpu_strategy(tpu_address):
    resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu_address)
    tf.config.experimental_connect_to_cluster(resolver)
    tf.tpu.experimental.initialize_tpu_system(resolver)
    strategy = tf.distribute.experimental.TPUStrategy(resolver)
    return strategy


tpu_strategy = get_tpu_strategy(TPU_WORKER)

dataset = tfds.load('mnist', split='train', shuffle_files=True, try_gcs=True)
dataset = dataset.map(_parser).shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE).repeat(EPOCHS)

test_dataset = tfds.load('mnist', split='test', try_gcs=True)
test_dataset = test_dataset.map(_parser).batch(32).prefetch(tf.data.experimental.AUTOTUNE)

with tpu_strategy.scope():
    optimizer = tf.keras.optimizers.Adam()
    loss_obj = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True,reduction=tf.keras.losses.Reduction.NONE)
    metrics = [
        tf.keras.metrics.Mean('loss', dtype=tf.float32),
        tf.keras.metrics.SparseCategoricalAccuracy('accuracy', dtype=tf.float32)
    ]

    model = create_model([28, 28, 1])


    @tf.function(autograph=False)
    def train_step(x, y):
        with tf.GradientTape() as tape:
            predict = model(x)
            loss = loss_obj(y, predict)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        metrics[0].update_state(loss)
        metrics[1].update_state(y, predict)


    for i in range(EPOCHS):
        for x, y in dataset:
            tpu_strategy.run(train_step, args=[x, y])

        for metric in metrics:
            print(metric.name, metric.result())
            metric.reset_states()
    print('start test')
    for x, y in test_dataset:
        predict = model(x)
        loss = loss_obj(y, predict)
        metrics[0].update_state(loss)
        metrics[1].update_state(y, predict)

    for metric in metrics:
        print(metric.name, metric.result())
        metric.reset_states()

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs
MultiWorkerMirroredStrategy should have same problem.

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

The text was updated successfully, but these errors were encountered:

bhack · 2020-04-04T11:16:12Z

I've also request to extend the official custom Op guide repository to cover TPU tensorflow/custom-op#53

bhack · 2020-04-21T10:53:46Z

We have some updates about custom c++ ops and TPU at tensorflow/custom-op#53 (comment)

Mistobaan · 2020-05-10T21:32:46Z

I am deducing that tensorflow_addons image transformations do not work on TPU @bhack?

seanpmorgan · 2020-05-15T01:15:37Z

Hi @Mistobaan. Yes it's currently not supported for us to have XLA HLO ops registered outside of the central TF build. We do have plans to convert as many custom ops as possible to python composite ops so at least those will be compatible. See #1752

gabrieldemarmiesse changed the title ~~Addons don't support tpu~~ Custom ops in Addons don't work on tpu Apr 3, 2020

gabrieldemarmiesse added bug Something isn't working custom-ops labels Apr 3, 2020

fsx950223 referenced this issue in google/automl Apr 8, 2020

Fix the naming issue for autoaugment.

2aa1729

seanpmorgan closed this as completed May 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom ops in Addons don't work on tpu #1553

Custom ops in Addons don't work on tpu #1553

fsx950223 commented Apr 3, 2020

bhack commented Apr 4, 2020

Uh oh!

bhack commented Apr 21, 2020

Uh oh!

Mistobaan commented May 10, 2020

Uh oh!

seanpmorgan commented May 15, 2020

Uh oh!

Custom ops in Addons don't work on tpu #1553

Custom ops in Addons don't work on tpu #1553

Comments

fsx950223 commented Apr 3, 2020

bhack commented Apr 4, 2020

Uh oh!

bhack commented Apr 21, 2020

Uh oh!

Mistobaan commented May 10, 2020

Uh oh!

seanpmorgan commented May 15, 2020

Uh oh!