As the title says, please do mention this part was well. This tutorial only mentions using custom cuda kernel during the forward pass.