to get total average valid loss you divide by len(valid_loader) but for avaerage test loss you divide by len(test_loader.dataset). This does not seems correct. And also you are multiplying the loss with batch_size in test runs, but you dont do it for validation. Why?