Skip to content

[clang] On a fixed-size loop clang generates an individual copy of the body of the loop when specific optimization is enabled #73456

Open
@rilysh

Description

@rilysh

Hello,
As the title implies, for a certain size loop, clang misleadingly generates copies of the entire body of the loop, each one individually.

For example:

#include <stdio.h>

int main(void)
{
	unsigned int i;

	for (i = 0; i < 29; i++)
		fprintf(stdout, "i: %d\n", i);
}

With optimization level (-O0) clang generates:

.LBB0_1:
movl    $0, -4(%rbp)
movl    $0, -8(%rbp)
cmpl    $29, -8(%rbp)
jae     .LBB0_4
movq    stdout@GOTPCREL(%rip), %rax
movq    (%rax), %rdi
movl    -8(%rbp), %edx
leaq    .L.str(%rip), %rsi
movb    $0, %al
callq   fprintf@PLT
movl    -8(%rbp), %eax
addl    $1, %eax
movl    %eax, -8(%rbp)
jmp .LBB0_1

(Ignore other labels)

It's clear that the generated assembly output is first setting the index to zero and then comparing the index, if lower than 29. If not, increase it by one (add 1).

However, with optimization (performance-focused than the size), e.g. (-O2, -O3, -Ofast, etc.) clang generates:

movq	stdout@GOTPCREL(%rip), %r14
movq	(%r14), %rdi
leaq	.L.str(%rip), %rbx
movq	%rbx, %rsi
xorl	%edx, %edx
xorl	%eax, %eax
callq	fprintf@PLT
movq	(%r14), %rdi
movq	%rbx, %rsi
movl	$1, %edx
xorl	%eax, %eax
callq	fprintf@PLT
movq	(%r14), %rdi
movq	%rbx, %rsi
movl	$2, %edx
xorl	%eax, %eax

[ ... similar copies with different index value ... ]

movq	(%r14), %rdi
movq	%rbx, %rsi
movl	$28, %edx
xorl	%eax, %eax
callq	fprintf@PLT
xorl	%eax, %eax

In this assembly output, clang generates the entire body code (fprintf()) for each individual loop and sets up the index (which would be after incrementing the index). Note that this behavior only happens if the loop is > 0 and <= 28. In comparison, GCC generates the following assembly output (with -O2, -Ofast, etc):

xorl	%ebx, %ebx
movq	stdout(%rip), %rdi
movl	%ebx, %edx
movq	%rbp, %rsi
xorl	%eax, %eax
addl	$1, %ebx
call	fprintf@PLT
cmpl	$29, %ebx
jne	.L2
addq	$8, %rsp
xorl	%eax, %eax

This is quite similar to Clang's -O0 output. With -Os, and -Ofast, GCC generates nearly identical assembly with a few changes.

I've tested it with Clang 17.0.1 and GCC 13.2. Here's the Godbolt link: https://godbolt.org/z/1xq4qfGzd
I've only tested this on x86-64 and RISC-V 64-bit platforms (although I don't think the platform may vary since it only happens with performance-wise optimizations are enabled).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions