Skip to content

wrong mb_detect_encoding since php8.1 for very simple utf-8 strings #10481

@cristicotet

Description

@cristicotet

Description

The following code:

https://3v4l.org/ehb6U#veol

$str1 = '14';
$str2 = 'DQ';
$encodings1 = ['ASCII', 'UTF-8', 'UTF-16', 'UTF-16LE'];
$encodings2 = ['UTF-8', 'UTF-16', 'UTF-16LE'];
echo bin2hex($str1) . ' - ' .  mb_detect_encoding($str1, $encodings1, true)."\n";
echo bin2hex($str1) . ' - ' .  mb_detect_encoding($str1, $encodings2, true)."\n";
echo bin2hex($str2) . ' - ' .  mb_detect_encoding($str2, $encodings1, true)."\n";
echo bin2hex($str2) . ' - ' .  mb_detect_encoding($str2, $encodings2, true)."\n";

Resulted in this output:

# version 8.1.0 - 8.2.2 - wrong output
3134 - UTF-16
3134 - UTF-16
4451 - UTF-16LE
4451 - UTF-16LE

But I expected this output instead:

# version 5.4.0 - 8.0.27 - correct output
3134 - ASCII
3134 - UTF-8
4451 - ASCII
4451 - UTF-8

PHP Version

PHP 8.2.2

Operating System

Red Hat Enterprise Linux 9.1 / CentOS Linux 7

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions