Skip to content

Add convert_to_tensor to tensorflow #11292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 26, 2024

Conversation

hoel-bagard
Copy link
Contributor

Add the convert_to_tensor TensorFlow function.

The actual return type of the function is Union[EagerTensor, SymbolicTensor] (cf the tensorflow function), however those are not present in the stub yet, should they be added ?

I would like to help complete the TensorFlow stubs, so I would be thankful if you could tell me anything I would need to know in order to contribute.

This comment has been minimized.

Copy link
Contributor

@hmc-cs-mdrissi hmc-cs-mdrissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment. Otherwise if you'd like to add tf stubs I'm happy to review, with one caveat it may help if I give you remaining tf stubs I have.

The current typeshed tf stubs are subset of ones I use, but haven't spent much time in open source recently and each time I add stubs to typeshed I also do a pass to clean them up so some discrepancies have accumulated. The stubs I have do contain a lot more files/missing classes though and can be a good starting point.

@hmc-cs-mdrissi
Copy link
Contributor

https://github.com/hmc-cs-mdrissi/tensorflow_stubs I've placed all of tensorflow internal stubs I have in this public repository. I'd be happy to see them incorporated to typeshed and some of classes you want like Model do have stubs there.

These stubs have diverged a little from typeshed final ones, but they are very similar.

Copy link
Contributor

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@hoel-bagard
Copy link
Contributor Author

hoel-bagard commented Jan 26, 2024

@hmc-cs-mdrissi Thank you for the review, and thank you for sharing your stubs!

I don't have a huge codebase to test against, so the stubs I made are incomplete and too restrictive. I'll start by completing/fixing what I already have here using your stubs if that's ok with you. Or if there's a way I can help get your stubs incorporated to typeshed, please let me know!

On a bit of an unrelated note, I have two questions about how you're using your stubs (or the ones in typeshed):

  • When using the shape of a tensor, TensorShape's __getitem__ and __iter__ return int | None, even though in practice the return type is basically always int. I find that checking the type of each dimension is quite verbose and annoying. Do you have a good way to go about it ?
  • When creating custom Layers or Models in my code base, I would like to be able to annotate them with [_InputT, _OutputT]. However this is not possible in the actual python code, so I resorted to creating a stub file next to each of my layer/model definition file. But this duplicates the number of files and seems like it will get hard to maintain. Is there a better way to do it ?

I just had to switch from PyTorch to TensorFlow, so any advice on how to type a TensorFlow codebase would be extremely welcome.

@hmc-cs-mdrissi
Copy link
Contributor

When using the shape of a tensor, TensorShape's getitem and iter return int | None, even though in practice the return type is basically always int. I find that checking the type of each dimension is quite verbose and annoying. Do you have a good way to go about it ?

Concrete tensors always return int. Symbolic tensors may return None for dynamic dimensions like batch size. My experience is if you are writing a layer then first dimension (batch size) can commonly be None while other dimensions are rarely None. In practice you normally know if you have symbolic vs concrete tensor but I don’t see an easy way in type system to distinguish the two especially as which is used can vary on tensorflow execution mode (eager vs graph vs tf1). So I normally just put an assert not None as needed. In pytorch tensors are almost always concrete and if you stick to tf2 eager you may not notice the Nones much.

When creating custom Layers or Models in my code base, I would like to be able to annotate them with [_InputT, _OutputT]. However this is not possible in the actual python code, so I resorted to creating a stub file next to each of my layer/model definition file. But this duplicates the number of files and seems like it will get hard to maintain. Is there a better way to do it ?

Ideally runtime would become generic. While that's not case that way I handle it for layers/models is like this,

if TYPE_CHECKING:
    _KerasLayerBase = _KerasLayer
    _ModelBase = Model
else:
    class _KerasLayerBase(_KerasLayer, Generic[InputT, OutputT]):
        ...

    class _ModelBase(Model, Generic[InputT, OutputT]):
        ...


class KerasLayerGeneric(_KerasLayerBase[InputT, OutputT], Generic[InputT, OutputT]):
  ...

class KerasModelGeneric(_ModelBase[InputT, OutputT], Generic[InputT, OutputT]):
  ...

class KerasLayer(KerasLayerGeneric[tf.Tensor, tf.Tensor]):
  ...

class KerasModel(KerasModelGeneric[tf.Tensor, tf.Tensor]):
  ...

and then I always use these types at runtime.

I don't have a huge codebase to test against, so the stubs I made are incomplete and too restrictive. I'll start by completing/fixing what I already have #11306 using your stubs if that's ok with you. Or if there's a way I can help get your stubs incorporated to typeshed, please let me know!

My recommendation is do it few files at a time and keep pr size to couple hundred lines (at most 1k) for reviewability. You can pick any files in my stubs and prepare a PR for them here. The main steps I've had for adding to typeshed,

  • Pick some files to work on.
  • Work through stubtest/typeshed's CI. While my stubs are used with medium sized codebase, stubtest and primer often detect interesting issues to fix. One common issue is stubs were written mostly for tf 2.8/2.9, but typeshed stubs target a newer tf so stubtest may detect missing new parameters.
  • Work through PR review comments. So far from typeshed maintainers, and I'd be happy to review any tensorflow prs.

@JelleZijlstra JelleZijlstra merged commit 4fa759f into python:main Jan 26, 2024
@hoel-bagard
Copy link
Contributor Author

Thanks for sharing the snipet to make layers/models into generics, that worked really nicely!

My recommendation is do it few files at a time and keep pr size to couple hundred lines (at most 1k) for reviewability. You can pick any files in my stubs and prepare a PR for them here. The main steps I've had for adding to typeshed,

  • Pick some files to work on.
  • Work through stubtest/typeshed's CI. While my stubs are used with medium sized codebase, stubtest and primer often detect interesting issues to fix. One common issue is stubs were written mostly for tf 2.8/2.9, but typeshed stubs target a newer tf so stubtest may detect missing new parameters.
  • Work through PR review comments. So far from typeshed maintainers, and I'd be happy to review any tensorflow prs.

I'll do that, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants