-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I haven't done much work on sorting lately, but figured to share some findings.
I looked into unstable sorting networks this week and haven't been able to reproduce the suggested performance gain. I suspect there's some cache pollution due to the large instruction size when utilizing sorting networks in a quicksort.
So far my best results have been using piposort on a threshold of 96, with unrolled 4, 8, 16 element parity merges and twice-unguarded insertion to fill the gaps.
As for the high performance reported by rust sorts, I suspect it's primarily due to rust compiling ? : ternary operations as branchless. This makes the benchmarks quite misleading, since there's no such thing in gcc.
When comparing crumsort compiled with clang to pdqsort compiled with g++, pdqsort is nearly two times slower than crumsort for 10000 elements.