You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix OSError: [Errno 24] Too many open files in multi-copy benchmark (#5083)
Summary:
Pull Request resolved: #5083
X-link: https://github.com/facebookresearch/FBGEMM/pull/2089
When running benchmarks with a large number of copies, the process may raise:
OSError: [Errno 24] Too many open files.
Example command:
(fbgemm_gpu_env)$ ulimit -n 1048576
(fbgemm_gpu_env)$ python ./bench/tbe/tbe_inference_benchmark.py nbit-cpu \
--num-embeddings=40000000 --bag-size=2 --embedding-dim=96 \
--batch-size=162 --num-tables=8 --weights-precision=int4 \
--output-dtype=fp32 --copies=96 --iters=30000
PyTorch multiprocessing provides two shared-memory strategies: 1.file_descriptor (default)
2.file_system
The default file_descriptor strategy uses file descriptors as shared memory handles, which can result in a large number of open FDs when many tensors are shared.
If the total number of open FDs exceeds the system limit and cannot be raised, the file_system strategy should be used instead.
This patch allows switching to the file_system strategy by setting:
export PYTORCH_SHARE_STRATEGY='file_system'
Reference:
https://pytorch.org/docs/stable/multiprocessing.html#sharing-strategies
Pull Request resolved: #5037
Reviewed By: spcyppt
Differential Revision: D86135817
Pulled By: q10
fbshipit-source-id: 15f6fe7e1de5e9fef828f5a1496dc1cf9b41c293
0 commit comments