As the author of dua I noticed that on the M1 chip IO performance gets worse if the high-efficiency cores are taken into consideration when configuring thread pools.
Thus for now I hardcode the value for optimal performance knowing that it might break sometime later this year.
Do you think it's in scope to add such capability to num_cpus?