Skip to content

mb_detect_encoding recognizes all letters in Hungarian alphabet #8629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ext/mbstring/common_codepoints.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,12 @@
0x0118 0x011B # Polish, Czech
0x0141 0x0144 # Polish
0x0147 0x0148 # Czech
0x0150 0x0151 # Hungarian
0x0158 0x015B # Czech, Polish
0x0160 0x0161 # Used in Slavic names
0x0164 0x0165 # Czech
0x016E 0x016F # Czech
0x0170 0x0171 # Hungarian
0x0179 0x017E # Polish, Czech, other Slavic languages
0x0300 0x030A # Diacritical marks
0x0370 0x0377 # Greek
Expand Down
2 changes: 1 addition & 1 deletion ext/mbstring/rare_cp_bitvec.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

static uint32_t rare_codepoint_bitvec[] = {
0xffffd9ff, 0x00000000, 0x00000000, 0x80000000, 0xffffffff, 0x00002001, 0x00000000, 0x00000000,
0xf0ff0f0f, 0xffffffff, 0xf0fffe61, 0x81ff3fcc, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xf0ff0f0f, 0xffffffff, 0xf0fcfe61, 0x81fc3fcc, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff,
0xfffff800, 0xffffffff, 0xffffffff, 0x0300ffff, 0x0000280f, 0x00000004, 0x00000000, 0x00000000,
0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000,
Expand Down
3 changes: 3 additions & 0 deletions ext/mbstring/tests/mb_detect_encoding.phpt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ $euc_jp = "\xC6\xFC\xCB\xDC\xB8\xEC\xA5\xC6\xA5\xAD\xA5\xB9\xA5\xC8\xA4\xC7\xA4\
// UTF-8
$polish1 = "Zażółć gęślą jaźń.";
$polish2 = "Wół poszedł spać bardzo wcześnie. A to zdanie bez ogonka.";
$hungarian = "Árvíztűrő tükörfúrógép";

echo "== BASIC TEST ==\n";

Expand Down Expand Up @@ -309,6 +310,8 @@ $czechEncodings = [
];
test($czechStrings, $czechEncodings);

test([$hungarian], ['UTF-8', 'UTF-16', 'Windows-1252']);

echo "Done!\n";

?>
Expand Down