Skip to content

mb_detect_encoding() detects UTF-8 emoji byte sequence as ISO-8859-1 since PHP 8.1 #7871

Closed
@filecage

Description

@filecage

Description

I have a piece of code that tries to normalise the encoding of incoming strings using mb_detect_encoding(). While upgrading to PHP 8.1, I've noticed that a test which ensures that an urlencoded UTF-8 sequence (party hat emoji) now fails. It all comes down to a behaviour change of mb_detect_encoding() when passing UTF-8 and ISO-8859-1 (in which ever order) as $encodings in PHP 8.1.

I know that the way this method works (or the way that determining the enconding of a string works at all) can not be 100% realiable, so I'd also agree on not classifying this as a bug. However, it is an undocumented behaviour change introduced in 8.1 that might break existing code as it did with mine.

I assume that this change has been introduced with 28b346b.

Example

See https://3v4l.org/RgdfE

<?php
echo mb_detect_encoding('🥳', ['UTF-8', 'ISO-8859-1']);

Resulted in this output:

ISO-8859-1

But I expected this output instead:

UTF-8

PHP Version

8.1.1

Operating System

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions