Skip to content

Custom ops in Addons don't work on tpu #1553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fsx950223 opened this issue Apr 3, 2020 · 4 comments
Closed

Custom ops in Addons don't work on tpu #1553

fsx950223 opened this issue Apr 3, 2020 · 4 comments
Labels
bug Something isn't working custom-ops

Comments

@fsx950223
Copy link
Member

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow version and how it was installed (source or binary):
  • TensorFlow-Addons version and how it was installed (source or binary):
  • Python version:
  • Is GPU used? (yes/no):

Describe the bug
Addons only register kernel on client device, TPU server will raise an error.
A clear and concise description of what the bug is.
NotFoundError: Op type not registered 'Addons>ImageProjectiveTransformV2' in binary running on n-cf493c1a-w-0. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) tf.contrib.resampler should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
Code to reproduce the issue

import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_addons as tfa
import os

TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
EPOCHS = 5


def create_model(input_shape):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Conv2D(28, kernel_size=3, activation=tf.nn.relu, input_shape=input_shape))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=2))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
    model.add(tf.keras.layers.Dropout(0.2))
    model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
    return model


def _parser(example):
    image = tf.image.convert_image_dtype(example["image"], tf.float32)
    with tf.device('/device:cpu:0'):
      image = tfa.image.translate(image,[1,1])
    label = example["label"]
    return tf.cast(image,tf.bfloat16), label


def get_tpu_strategy(tpu_address):
    resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu_address)
    tf.config.experimental_connect_to_cluster(resolver)
    tf.tpu.experimental.initialize_tpu_system(resolver)
    strategy = tf.distribute.experimental.TPUStrategy(resolver)
    return strategy


tpu_strategy = get_tpu_strategy(TPU_WORKER)

dataset = tfds.load('mnist', split='train', shuffle_files=True, try_gcs=True)
dataset = dataset.map(_parser).shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE).repeat(EPOCHS)

test_dataset = tfds.load('mnist', split='test', try_gcs=True)
test_dataset = test_dataset.map(_parser).batch(32).prefetch(tf.data.experimental.AUTOTUNE)

with tpu_strategy.scope():
    optimizer = tf.keras.optimizers.Adam()
    loss_obj = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True,reduction=tf.keras.losses.Reduction.NONE)
    metrics = [
        tf.keras.metrics.Mean('loss', dtype=tf.float32),
        tf.keras.metrics.SparseCategoricalAccuracy('accuracy', dtype=tf.float32)
    ]

    model = create_model([28, 28, 1])


    @tf.function(autograph=False)
    def train_step(x, y):
        with tf.GradientTape() as tape:
            predict = model(x)
            loss = loss_obj(y, predict)
        gradients = tape.gradient(loss, model.trainable_variables)
        optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        metrics[0].update_state(loss)
        metrics[1].update_state(y, predict)


    for i in range(EPOCHS):
        for x, y in dataset:
            tpu_strategy.run(train_step, args=[x, y])

        for metric in metrics:
            print(metric.name, metric.result())
            metric.reset_states()
    print('start test')
    for x, y in test_dataset:
        predict = model(x)
        loss = loss_obj(y, predict)
        metrics[0].update_state(loss)
        metrics[1].update_state(y, predict)

    for metric in metrics:
        print(metric.name, metric.result())
        metric.reset_states()

Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs
MultiWorkerMirroredStrategy should have same problem.

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

@gabrieldemarmiesse gabrieldemarmiesse changed the title Addons don't support tpu Custom ops in Addons don't work on tpu Apr 3, 2020
@gabrieldemarmiesse gabrieldemarmiesse added bug Something isn't working custom-ops labels Apr 3, 2020
@bhack
Copy link
Contributor

bhack commented Apr 4, 2020

I've also request to extend the official custom Op guide repository to cover TPU tensorflow/custom-op#53

@bhack
Copy link
Contributor

bhack commented Apr 21, 2020

We have some updates about custom c++ ops and TPU at tensorflow/custom-op#53 (comment)

@Mistobaan
Copy link

I am deducing that tensorflow_addons image transformations do not work on TPU @bhack?

@seanpmorgan
Copy link
Member

Hi @Mistobaan. Yes it's currently not supported for us to have XLA HLO ops registered outside of the central TF build. We do have plans to convert as many custom ops as possible to python composite ops so at least those will be compatible. See #1752

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working custom-ops
Projects
None yet
Development

No branches or pull requests

5 participants