Is it possible to convert the onnx model to fp16 model?

The torch example gives parameter ```revision="fp16"```, can onnx model do the same optimization? Current onnx inference(using CUDAExecutionProvider) is slower than torch version, and used more gpu memory than torch version(12G vs 4G).