-
Notifications
You must be signed in to change notification settings - Fork 184
Description
Here is an analysis from a colleague:
Quote
The speed-up for us seems to be coming from the fact that pickling modules takes a long time:
In [25]: %timeit cloudpickle.dumps(numpy, -1)
100 loops, best of 3: 3.03 ms per loop
It looks like _find_module() will use imp.find_module() which traverses sys.path to look for things that look like numpy. In our environment, sys.path tends to be long and our filesystems tend to be slow, hence the 3.03 ms.
def save_module(self, obj):
"""
Save a module as an import
"""
mod_name = obj.__name__
# If module is successfully found then it is not a dynamically created module
try:
_find_module(mod_name) # EXPENSIVE!!!!!
is_dynamic = False
except ImportError:
is_dynamic = True
self.modules.add(obj)
if is_dynamic:
self.save_reduce(dynamic_subimport, (obj.__name__, vars(obj)), obj=obj)
else:
self.save_reduce(subimport, (obj.__name__,), obj=obj)
dispatch[types.ModuleType] = save_moduleSo it looks like cloudpickle is trying to allow for "dynamically created modules". If it didn't try to be this flexible, then the entire function should just be
self.save_reduce(subimport, (obj.__name__,), obj=obj)So the danger is if people are using "dynamically created modules", which we don't tend to do.
Maybe an easy way out is to check if obj.__file__ exists (the attribute, not the file). If it does, then immediately assume that is_dynamic=False.
Fwiw, I think we're pickling numpy because we're pickling functions that refer to numpy. Not positive though.