Skip to content

[x86][MC] Over-decode invalid instruction with mutual exclusive prefix and unmatch opcode #117306

@venkyqz

Description

@venkyqz

Work environment

Questions Answers
OS/arch/bits x86_64 Ubuntu 20.04
Architecture x86_64
Source of Capstone git clone, default on master branch.
Version/git commit llvm-20git, f08278

minimum PoC disassembler

#include <llvm-c/Disassembler.h>
#include <llvm-c/Target.h>

int main(int argc, char *argv[]){
    /*
       some input sanity check of hex string from argv
    */
    // Initialize LLVM after input validation
    LLVMInitializeAllTargetInfos();
    LLVMInitializeAllTargets();
    LLVMInitializeAllTargetMCs();
    LLVMInitializeAllDisassemblers();

    LLVMDisasmContextRef disasm = LLVMCreateDisasm("x86_64", NULL, 0, NULL, NULL);
    if (!disasm) {
        errx(1, "Error: LLVMCreateDisasm() failed.");
    }

    // Set disassembler options: print immediates as hex, use Intel syntax
    if (!LLVMSetDisasmOptions(disasm, LLVMDisassembler_Option_PrintImmHex |
                                        LLVMDisassembler_Option_AsmPrinterVariant)) {
        errx(1, "Error: LLVMSetDisasmOptions() failed.");
    }

    char output_string[MAX_OUTPUT_LENGTH];
    uint64_t address = 0;
    size_t instr_len = LLVMDisasmInstruction(disasm, raw_bytes, bytes_len, address,
                                             output_string, sizeof(output_string));

    if (instr_len > 0) {
        printf("%s\n", output_string);
    } else {
        printf("Error: Unable to disassemble the input bytes.\n");
    }
}

Instruction bytes giving faulty results

f2 f0 41 0f b7 d6

Expected results

It should be:

Error: Unable to disassemble the input bytes.

Actually results

$./min_llvm_disassembler "f2f0410fb7d6"
        xacquire

Additional Logs, screenshots, source code, configuration dump, ...

This is similar to a verified bug in the capstone engine. Bytes "f2f0410fb7d6" can not be translated into valid x86 instructions because of mutual exclusive prefixes f2, f0 and LOCK prefix on register operation. But llvm MC accepts it into instruction xacquire. All the other instruction decoders like the Capstone, Zydis, and Xed reject the byte sequences. Not sure whether the workaround in this pull request can fix this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions