The cache_dir key-value parameter does not work as intended in PreTrainedModel.from_pretrained(...). It is popped from the kwargs, then PretrainedConfig.from_pretrained(...) is called which expects this parameter in the kwargs, but it's obviously not there anymore. A default location is used as a fallback, but this leads to strange behaviour if this default location doesn't exist or isn't writable (as it was in my case).