-
Notifications
You must be signed in to change notification settings - Fork 10
Description
We have a global variable that counts malloc'd bytes and gets updated for every malloc call. If there are multiple threads that are doing malloc, there will be contention and will have measurable overhead.
The following is measured with Julia GCBenchmarks, using the multithreaded benchmarks (using 8 mutator threads). The two builds both return 0 in vm_live_bytes() for a fair comparison, and the build with no-malloc-counter does not have the malloc counter update. The results showed that there is measurable overhead for some benchmarks, e.g. 2% slowdown for mergesort_parallel.
MMTK_MIN_HSIZE=31650 MMTK_MAX_HSIZE=31650 /home/yilin/Code/julia_workspace/julia/julia-mmtk-immix-release-no-malloc-counter/usr/bin/julia --project=/home/yilin/Code/julia_workspace/GCBenchmarks /home/yilin/Code/julia_workspace/GCBenchmarks/run_benchmarks.jl multithreaded mergesort_parallel mergesort_parallel -n 1 --threads=8
| total time | gc time | mutator time | total time error | |
|---|---|---|---|---|
| ('multithreaded-big_arrays-issue-52937', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') | 7328.7 | 0 | 7328.7 | 3.26144 |
| ('multithreaded-big_arrays-issue-52937', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') | 7345.78 | 0 | 7345.78 | 2.8509 |
| ('multithreaded-big_arrays-objarray', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') | 7279.05 | 0 | 7279.05 | 7.97443 |
| ('multithreaded-big_arrays-objarray', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') | 7288.47 | 0 | 7288.47 | 6.95254 |
| ('multithreaded-binary_tree-tree_immutable', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') | 2233.35 | 360.83 | 1872.52 | 3.61634 |
| ('multithreaded-binary_tree-tree_immutable', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') | 2231.79 | 360.56 | 1871.23 | 3.18454 |
| ('multithreaded-binary_tree-tree_mutable', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') | 3130.31 | 640.23 | 2490.08 | 6.81284 |
| ('multithreaded-binary_tree-tree_mutable', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') | 3132.71 | 641.74 | 2490.97 | 6.62351 |
| ('multithreaded-mergesort_parallel-mergesort_parallel', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') | 20202.5 | 0 | 20202.5 | 811.654 |
| ('multithreaded-mergesort_parallel-mergesort_parallel', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') | 20648 | 0 | 20648 | 608.926 |
| ('multithreaded-mm_divide_and_conquer-mm_divide_and_conquer', 'julia-mmtk-immix(6.0x minheap,.multithreaded-8)') | 791.47 | 0 | 791.47 | 1.83954 |
| ('multithreaded-mm_divide_and_conquer-mm_divide_and_conquer', 'julia-mmtk-immix-no-malloc-counter(6.0x minheap,.multithreaded-8)') | 797.59 | 0 | 797.59 | 1.93677 |
One way to mitigate this issue is to reduce the frequency of global counter update. We could have a local counter for malloc'd bytes, and only update the global counter for every X bytes allocated (X could be 16K or something).