-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Description
Ref #21940
In most parallel operations I've come across, there is at least some data that moves to the processor. Because it appears that pmap sends function arguments to workers for every block (determined by block_size, default=1), this means that pmaps default behavior results in large inefficiencies when functions within the pmap block reference large data structures.
@amitmurthy has suggested, and testing has confirmed, that the use of Distributed.CachingPool will reduce the amount of data transferred to functions called within a pmap block.
The use of caching pools incurs a small amount of overhead, which is more apparent when pmap doesn't result in data transfers of appreciable size. This is less common in real-world situations, though.
This proposal is to modify pmap so that it creates a caching pool equal to the number of workers by default, with an optional override when pmap is used without data transfer. This provides the expected performance gains from pmap use under most real-world conditions out-of-the-box, with the flexibility to tune things in these less-frequent cases.
Right now, there is a proposal to deprecate @parallel for loops. @parallel does not suffer from this issue, and therefore provides better performance than pmap when data is processed by workers. This proposal would insure that pmap is at least as performant as @parallel once the latter is deprecated.