selftests/bpf: retry bpf_map_update_elem() when E2BIG is returned #10257
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Executing the test_maps binary on platforms with extremely high core counts may cause intermittent assertion failures in test_update_delete() (called via test_map_parallel()). This can occur because bpf_map_update_elem() under some circumstances (specifically in this case while performing bpf_map_update_elem() with BPF_NOEXIST on a BPF_MAP_TYPE_HASH with its map_flags set to BPF_F_NO_PREALLOC) can return an E2BIG error code i.e.
error -7 7
tools/testing/selftests/bpf/test_maps.c:#: void test_update_delete(unsigned int, void *): Assertion
err == 0' failed. tools/testing/selftests/bpf/test_maps.c:#: void __run_parallel(unsigned int, void (*)(unsigned int, void *), void *): Assertionstatus == 0' failed.As it turns out, is_map_full() which is called from alloc_htab_elem() can take on a conservative approach when htab->use_percpu_counter is true (which is the case here because the percpu_counter is used when a BPF_MAP_TYPE_HASH is created with its map_flags set to BPF_F_NO_PREALLOC). This conservative approach approach prioritizes preventing over-allocation and potential issues that could arise from possibly exceeding htab->map.max_entries in highly concurrent environments, even if it means slightly under-utilizing the htab map's capacity.
Given that bpf_map_update_elem() from test_update_delete() can return E2BIG, update can_retry() such that it also accounts for the E2BIG error code (specifically only when running with map_flags being set to BPF_F_NO_PREALLOC). The retry loop will allow the global count belonging to the percpu_counter to become synchronized and better reflect the current htab map's capacity.