Serializing modules can be slow

Here is an analysis from a colleague:

### Quote

The speed-up for us seems to be coming from the fact that pickling modules takes a long time:
```
In [25]: %timeit cloudpickle.dumps(numpy, -1)
100 loops, best of 3: 3.03 ms per loop
```
It looks like ``_find_module()`` will use ``imp.find_module()`` which traverses ``sys.path`` to look for things that look like numpy.  In our environment, sys.path tends to be long and our filesystems tend to be slow, hence the 3.03 ms. 

```python
    def save_module(self, obj):
        """
        Save a module as an import
        """
        mod_name = obj.__name__
        # If module is successfully found then it is not a dynamically created module
        try:
            _find_module(mod_name)     # EXPENSIVE!!!!!
            is_dynamic = False
        except ImportError:
            is_dynamic = True

        self.modules.add(obj)
        if is_dynamic:
            self.save_reduce(dynamic_subimport, (obj.__name__, vars(obj)), obj=obj)
        else:
            self.save_reduce(subimport, (obj.__name__,), obj=obj)
    dispatch[types.ModuleType] = save_module
```
So it looks like cloudpickle is trying to allow for "dynamically created modules".  If it didn't try to be this flexible, then the entire function should just be
```python
self.save_reduce(subimport, (obj.__name__,), obj=obj)
```
So the danger is if people are using "dynamically created modules", which we don't tend to do.

Maybe an easy way out is to check if ``obj.__file__`` exists (the attribute, not the file).  If it does, then immediately assume that is_dynamic=False.

Fwiw, I think we're pickling ``numpy`` because we're pickling functions that refer to ``numpy``.  Not positive though.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serializing modules can be slow #84

Quote

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Serializing modules can be slow #84

Description

Quote

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions