Skip to content

Major overhaul of mbstring (part 1) #6052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 123 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
af8d6bf
Fix mbstring support for Shift-JIS
alexdowad Oct 19, 2020
ffb01c9
Add test suite for Shift-JIS encoding
alexdowad Oct 19, 2020
b96b1f4
Add identify filter for Shift-JIS-2004
alexdowad Sep 7, 2020
714a7d1
SJIS-2004 encoding conversion: handle invalid (or truncated) 2nd byte…
alexdowad Sep 8, 2020
9ece2b3
Don't mangle non-Japanese chars which appear after a 'combining' kana…
alexdowad Sep 17, 2020
4bba852
Add test suite for SJIS-2004 encoding
alexdowad Sep 8, 2020
0c0a4b8
Add identify filter for MacJapanese (variant of Shift-JIS)
alexdowad Sep 7, 2020
55dcf81
SJIS-mac encoding conversion: handle invalid (or truncated) 2nd byte …
alexdowad Sep 9, 2020
4feb701
Convert Unicode halfwidth Yen sign to MacJapanese halfwidth Yen sign
alexdowad Sep 19, 2020
6cd472a
SJIS-mac encoding conversion: Stop the carnage of innocent Unicode co…
alexdowad Sep 19, 2020
0a6e568
Add test suite for SJIS-mac encoding
alexdowad Sep 9, 2020
86e927a
Enhance mbstring support for UCS-4 text
alexdowad Sep 6, 2020
431c4b9
Leading BOM is stripped for UTF-32
alexdowad Sep 4, 2020
a38a4a0
Enhance mbstring support for UCS-2 text
alexdowad Sep 4, 2020
675a311
Consolidate all single-byte encodings in one source file
alexdowad Nov 3, 2020
f6e834f
Fix identify filter for UTF-7
alexdowad Sep 27, 2020
02fc58e
Minor code cleanup in mbfilter_utf7.c
alexdowad Sep 5, 2020
0879484
Catch and handle errors in UTF-7 text conversion
alexdowad Oct 15, 2020
a825801
Add test suite for UTF-{7,8,16,32}
alexdowad Oct 13, 2020
2c0098d
Add identify filter for modified UTF-7 (for IMAP protocol)
alexdowad Sep 7, 2020
7cbe62b
Add test suite for mUTF-7 (UTF7-IMAP) encoding
alexdowad Sep 11, 2020
cbfc02f
Add identify filter for CP50220-raw
alexdowad Sep 7, 2020
5d6ecd3
Add identify filter for 'HTML entities' encoding
alexdowad Sep 9, 2020
8e9d740
Add identify filter for 'byte2be' and 'byte2le' encodings
alexdowad Sep 12, 2020
f3e730c
Add identify filter for 'byte4be' and 'byte4le' encodings
alexdowad Sep 12, 2020
fb9a1d9
Add identify filter for uuencode 'character encoding'
alexdowad Sep 25, 2020
08a064b
Add identify filter for QPrint 'character encoding'
alexdowad Sep 25, 2020
5550c71
Add identify filter for Base64 'character encoding'
alexdowad Sep 25, 2020
32bb947
Stricter identification of valid strings in ISO-2022-JP-2004 encoding
alexdowad Sep 12, 2020
b374321
ISO-2022-JP-2004 conversion: handle invalid characters correctly
alexdowad Sep 13, 2020
3dfc88f
Add test suite for ISO-2022-JP-2004 encoding
alexdowad Sep 13, 2020
2e06352
Add comment explaining why ISO-2022-JP-2004, etc strings end with ESC…
alexdowad Sep 19, 2020
f3e3ee5
Stricter identification of valid strings in JIS7/JIS8 and ISO-2022-JP…
alexdowad Sep 13, 2020
1d10932
JIS7/8 encoding: handle invalid 2nd byte for Kanji correctly
alexdowad Sep 14, 2020
5321788
Add test suite for JIS7/JIS8 and ISO-2022-JP encodings
alexdowad Sep 14, 2020
92dd38f
Update 'East Asian Width' table to comply with Unicode 13.0
alexdowad Sep 24, 2020
360187e
Correct wrong flags for 'byte2be', 'byte2le', 'byte4be', and 'byte4le…
alexdowad Sep 26, 2020
938b517
Enhance handling of EUC-JP text encoding
alexdowad Oct 1, 2020
eaca866
Add test suite for EUC-JP encoding
alexdowad Sep 30, 2020
ae0f229
Add identify filter for text encoding 'wchar'
alexdowad Sep 25, 2020
215c146
All mbstring encodings have identify filter now
alexdowad Sep 6, 2020
934a2d8
Tell compiler which way jumps will usually go in mbfl_memory_device.c
alexdowad Aug 29, 2020
2c6db2c
Simplify code for handling mbstring language aliases
alexdowad Aug 30, 2020
7874786
Major refactoring and optimization of mbfilter.c
alexdowad Jul 26, 2020
c0eeafd
Remove useless struct: mbfl_filt_tl_jisx0201_jisx0208_param
alexdowad Jul 17, 2020
54e189f
Minor code cleanup in php_unicode.h
alexdowad Jul 17, 2020
35a2cec
Remove unneeded function mbfl_convert_filter_copy from mbstring
alexdowad Jul 23, 2020
163815a
Optimize mbstring upper/lowercasing: use fast path in more cases
alexdowad Jul 26, 2020
9930b0c
Fix buggy support for negative 'start' argument to mb_strimwidth
alexdowad Jul 28, 2020
809351c
Remove broken test for validity of 'start' argument to mb_strimwidth
alexdowad Jul 28, 2020
05ad087
Remove unneeded function mbfl_filt_tl_jisx0201_jisx0208_init
alexdowad Aug 1, 2020
fab2151
Remove unneeded struct: mbfl_filt_conv_wchar_cp50220_ctx
alexdowad Aug 1, 2020
20de301
mbfl_wchar_device_init accepts initial buffer size argument
alexdowad Aug 4, 2020
a5e5430
Remove unused COMPAT2 constants from mbstring
alexdowad Aug 5, 2020
db69a19
Rename 'COMPAT1' constants for mbstring to 'SPECIAL' (more meaningful…
alexdowad Aug 7, 2020
48e390f
Remove unused MBFL_FILT_TL_{HAN2ZEN,ZEN2HAN}_MASK constants
alexdowad Aug 7, 2020
fd03267
Rename MBFL_FILT_TL_ZEN2HAN_{HIRA2KANA,KANA2HIRA} to accurately refle…
alexdowad Aug 7, 2020
6a74bb4
Disallow nonsensical combinations of conversion flags to mb_convert_kana
alexdowad Aug 7, 2020
e612f38
Return values of 'filter flush' functions are never used
alexdowad Aug 8, 2020
7cf60e0
Canonicalize flags received by mb_convert_kana
alexdowad Aug 8, 2020
d2d0c1c
Don't redundantly flush filter chain twice in mbfl_convert_filter_flush
alexdowad Aug 8, 2020
1ab4f68
Remove unneeded 'filter_ctor' field from filter structs
alexdowad Aug 8, 2020
d88df51
Initialization of encoding identification filters is more concise
alexdowad Aug 8, 2020
a67cb8f
Remove unused 'score' field from mbfl_identify_filter struct
alexdowad Aug 8, 2020
e48660d
Rewrite UTF-8 -> wchar conversion to use function pointer as state va…
alexdowad Aug 12, 2020
3c510fa
Optimize wchar -> JIS conversion (mbfl_filt_conv_wchar_jis)
alexdowad Aug 15, 2020
4b53585
Optimize encoding of HTML decimal numeric entities
alexdowad Aug 15, 2020
951001a
Offset and mask values in mb_encode_numericentity convmap do not affe…
alexdowad Aug 24, 2020
8836e3c
Test strimwidth with UTF-16LE text
alexdowad Aug 25, 2020
3a5c5e7
Test mb_strcut more thoroughly on UTF-16LE text
alexdowad Aug 25, 2020
4269462
mb_convert_kana throws an error on unrecognized flags
alexdowad Aug 25, 2020
4c19d61
Use mbfl_memory_device_output for emitting newlines in mb_send_mail
alexdowad Aug 29, 2020
60483da
Code cleanup in mbfl_language.c
alexdowad Aug 30, 2020
f6447ea
Code cleanup in mb_str_split
alexdowad Aug 31, 2020
4a74877
[WIP] Move meat of mb_str_split to mbfl_str_split (in mbfilter.c)
alexdowad Aug 31, 2020
e19f9b2
Minor cleanup in mbfilter_byte2.c
alexdowad Sep 3, 2020
6223925
Minor cleanup in mbfilter_byte4.c
alexdowad Sep 3, 2020
93158db
Minor cleanup in mbfilter_base64.c
alexdowad Sep 3, 2020
0e01d8a
Minor cleanup in mbfilter_uuencode.c
alexdowad Sep 4, 2020
e9d4694
Code cleanup in mbfilter_htmlent.c
alexdowad Sep 4, 2020
bb5fb71
Code cleanup in mbfilter_qprint.c
alexdowad Sep 4, 2020
e39473d
Combine MBFL_ENCTYPE_WCS{2,4}{BE,LE} constants
alexdowad Sep 20, 2020
03b57af
Nothing uses return values of identity filter functions
alexdowad Sep 21, 2020
b472672
Identify filter functions should take unsigned char, not int
alexdowad Sep 21, 2020
aeca9e6
Encoding conversion functions don't need to return anything
alexdowad Sep 21, 2020
fe8ad10
Optimize calls to mb_substr which return entire input string
alexdowad Sep 24, 2020
1ce3045
Minor code tweaks in mbfl_convert.c
alexdowad Sep 27, 2020
cef8d9e
Add comment on assertion in mbstring.c which can fail
alexdowad Sep 27, 2020
9274e2a
Add 'Windows-932' alias for CP932 text encoding
alexdowad Oct 4, 2020
b2a7ad4
Enhance handling of CP932 text encoding
alexdowad Oct 4, 2020
2cee875
Add test suite for CP932 text encoding
alexdowad Oct 3, 2020
490baf6
Don't pass invalid JIS X 0208 characters through
alexdowad Oct 7, 2020
7100498
Don't pass invalid JIS X 0212 and Windows-CP932 characters through
alexdowad Oct 7, 2020
5f0226a
Remove useless mbstring encoding 'CP50220-raw'
alexdowad Oct 7, 2020
24267b4
Fix identify filter for CP50220 text encoding
alexdowad Oct 8, 2020
99320df
Stricter handling of erroneous input when converting CP5022{0,1,2} te…
alexdowad Oct 8, 2020
246bba7
CP5022{0,1,2}: convert characters in ku 0x2D (13th row) correctly
alexdowad Oct 11, 2020
2393da2
CP5022{0,1,2}: convert Unicode codepoints in 'user' area (0xE000-E757…
alexdowad Oct 11, 2020
cfd3b56
Add test suite for CP5022{0,1,2}
alexdowad Oct 13, 2020
73cac6a
Combine MBFL_ENCTYPE_MWC2{BE,LE} constants
alexdowad Oct 13, 2020
9f5fee9
Remove useless mbstring encoding 'JIS-ms'
alexdowad Oct 17, 2020
61981ad
Remove unused macros from mbfilter_cp51932.c, mbfilter_iso2022jp_mobi…
alexdowad Oct 18, 2020
1bda08d
Enhance handling of CP51932 encoding
alexdowad Oct 18, 2020
b222b4d
Add test suite for CP51932 encoding
alexdowad Oct 18, 2020
2540809
Remove unused function php_mb_safe_strrchr
alexdowad Oct 18, 2020
8ef5543
Remove useless constant MBFL_ENCTYPE_MBCS
alexdowad Oct 18, 2020
7e18f2f
Simplify php_mb_zend_encoding_converter
alexdowad Oct 18, 2020
c48d07c
Don't check for impossible error condition in mb_strlen
alexdowad Oct 18, 2020
c4cbaa7
Remove useless constant MBFL_ERROR_ENCODING
alexdowad Oct 18, 2020
a8d947c
Don't check for impossible error condition in mb_substr_count
alexdowad Oct 18, 2020
951bc0d
Don't check for impossible error condition in mb_strwidth
alexdowad Oct 18, 2020
0488bbb
Use stack-allocated buffer in php_mb_chr
alexdowad Oct 18, 2020
2c53090
Remove unused function php_mb_mbchar_bytes
alexdowad Oct 18, 2020
5234d92
No need to null-terminate buffer in php_mb_chr
alexdowad Oct 18, 2020
befbb46
Fix mbstring support for SJIS-Mobile (DoCoMo, KDDI, and Softbank vari…
alexdowad Oct 20, 2020
347e14c
Add test suite for SJIS-Mobile
alexdowad Oct 21, 2020
8da76e5
CP932: treat truncated multi-byte characters as an error
alexdowad Oct 22, 2020
0534a1d
Remove duplicate implementation of CP932 from mbstring
alexdowad Oct 22, 2020
83653b2
Fix mbstring support for ISO-2022-JP-KDDI encoding
alexdowad Oct 23, 2020
6cef434
Add test suite for ISO-2022-JP-KDDI encoding
alexdowad Oct 25, 2020
3534564
Fix mbstring support for ISO-2022-JP-MS encoding
alexdowad Oct 25, 2020
dadfc70
Add test suite for ISO-2022-JP-MS encoding
alexdowad Oct 25, 2020
25ac652
Refactoring of UTF-8 with mobile vendor extensions (DoCoMo, KDDI, Sof…
alexdowad Oct 26, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 1 addition & 25 deletions ext/mbstring/config.m4
Original file line number Diff line number Diff line change
Expand Up @@ -94,18 +94,12 @@ AC_DEFUN([PHP_MBSTRING_SETUP_LIBMBFL], [
PHP_MBSTRING_ADD_SOURCES([
libmbfl/filters/html_entities.c
libmbfl/filters/mbfilter_7bit.c
libmbfl/filters/mbfilter_ascii.c
libmbfl/filters/mbfilter_base64.c
libmbfl/filters/mbfilter_big5.c
libmbfl/filters/mbfilter_byte2.c
libmbfl/filters/mbfilter_byte4.c
libmbfl/filters/mbfilter_cp1251.c
libmbfl/filters/mbfilter_cp1252.c
libmbfl/filters/mbfilter_cp1254.c
libmbfl/filters/mbfilter_cp5022x.c
libmbfl/filters/mbfilter_cp51932.c
libmbfl/filters/mbfilter_cp850.c
libmbfl/filters/mbfilter_cp866.c
libmbfl/filters/mbfilter_cp932.c
libmbfl/filters/mbfilter_cp936.c
libmbfl/filters/mbfilter_gb18030.c
Expand All @@ -121,26 +115,9 @@ AC_DEFUN([PHP_MBSTRING_SETUP_LIBMBFL], [
libmbfl/filters/mbfilter_iso2022jp_2004.c
libmbfl/filters/mbfilter_iso2022jp_mobile.c
libmbfl/filters/mbfilter_iso2022_kr.c
libmbfl/filters/mbfilter_iso8859_1.c
libmbfl/filters/mbfilter_iso8859_10.c
libmbfl/filters/mbfilter_iso8859_13.c
libmbfl/filters/mbfilter_iso8859_14.c
libmbfl/filters/mbfilter_iso8859_15.c
libmbfl/filters/mbfilter_iso8859_16.c
libmbfl/filters/mbfilter_iso8859_2.c
libmbfl/filters/mbfilter_iso8859_3.c
libmbfl/filters/mbfilter_iso8859_4.c
libmbfl/filters/mbfilter_iso8859_5.c
libmbfl/filters/mbfilter_iso8859_6.c
libmbfl/filters/mbfilter_iso8859_7.c
libmbfl/filters/mbfilter_iso8859_8.c
libmbfl/filters/mbfilter_iso8859_9.c
libmbfl/filters/mbfilter_jis.c
libmbfl/filters/mbfilter_koi8r.c
libmbfl/filters/mbfilter_armscii8.c
libmbfl/filters/mbfilter_qprint.c
libmbfl/filters/mbfilter_sjis.c
libmbfl/filters/mbfilter_sjis_open.c
libmbfl/filters/mbfilter_sjis_mobile.c
libmbfl/filters/mbfilter_sjis_mac.c
libmbfl/filters/mbfilter_sjis_2004.c
Expand All @@ -155,7 +132,6 @@ AC_DEFUN([PHP_MBSTRING_SETUP_LIBMBFL], [
libmbfl/filters/mbfilter_utf8.c
libmbfl/filters/mbfilter_utf8_mobile.c
libmbfl/filters/mbfilter_uuencode.c
libmbfl/filters/mbfilter_koi8u.c
libmbfl/mbfl/mbfilter.c
libmbfl/mbfl/mbfilter_8bit.c
libmbfl/mbfl/mbfilter_pass.c
Expand Down Expand Up @@ -201,7 +177,7 @@ PHP_ARG_ENABLE([mbregex],
if test "$PHP_MBSTRING" != "no"; then
AC_DEFINE([HAVE_MBSTRING],1,[whether to have multibyte string support])

PHP_MBSTRING_ADD_BASE_SOURCES([mbstring.c php_unicode.c mb_gpc.c])
PHP_MBSTRING_ADD_BASE_SOURCES([mbstring.c php_unicode.c mb_gpc.c mbstring_singlebyte.c])

if test "$PHP_MBREGEX" != "no"; then
PHP_MBSTRING_SETUP_MBREGEX
Expand Down
26 changes: 9 additions & 17 deletions ext/mbstring/config.w32
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,17 @@ if (PHP_MBSTRING != "no") {
"ext\\mbstring\\libmbfl\\config.h", true);

ADD_SOURCES("ext/mbstring/libmbfl/filters", "html_entities.c \
mbfilter_7bit.c mbfilter_ascii.c mbfilter_base64.c mbfilter_big5.c \
mbfilter_byte2.c mbfilter_byte4.c mbfilter_cp1251.c mbfilter_cp1252.c \
mbfilter_cp866.c mbfilter_cp932.c mbfilter_cp936.c mbfilter_cp51932.c \
mbfilter_euc_cn.c mbfilter_euc_jp.c mbfilter_euc_jp_win.c mbfilter_euc_kr.c \
mbfilter_7bit.c mbfilter_base64.c mbfilter_big5.c \
mbfilter_byte2.c mbfilter_byte4.c mbfilter_cp932.c \
mbfilter_cp936.c mbfilter_cp51932.c mbfilter_euc_cn.c \
mbfilter_euc_jp.c mbfilter_euc_jp_win.c mbfilter_euc_kr.c \
mbfilter_euc_tw.c mbfilter_htmlent.c mbfilter_hz.c mbfilter_iso2022_kr.c \
mbfilter_iso8859_1.c mbfilter_iso8859_10.c mbfilter_iso8859_13.c \
mbfilter_iso8859_14.c mbfilter_iso8859_15.c mbfilter_iso8859_16.c \
mbfilter_iso8859_2.c mbfilter_iso8859_3.c mbfilter_iso8859_4.c \
mbfilter_iso8859_5.c mbfilter_iso8859_6.c mbfilter_iso8859_7.c \
mbfilter_iso8859_8.c mbfilter_iso8859_9.c mbfilter_jis.c \
mbfilter_iso2022_jp_ms.c mbfilter_gb18030.c mbfilter_sjis_2004.c \
mbfilter_koi8r.c mbfilter_qprint.c mbfilter_sjis.c mbfilter_ucs2.c \
mbfilter_jis.c mbfilter_iso2022_jp_ms.c mbfilter_gb18030.c \
mbfilter_sjis_2004.c mbfilter_qprint.c mbfilter_sjis.c mbfilter_ucs2.c \
mbfilter_ucs4.c mbfilter_uhc.c mbfilter_utf16.c mbfilter_utf32.c \
mbfilter_utf7.c mbfilter_utf7imap.c mbfilter_utf8.c mbfilter_utf8_mobile.c \
mbfilter_koi8u.c mbfilter_cp1254.c mbfilter_euc_jp_2004.c \
mbfilter_uuencode.c mbfilter_armscii8.c mbfilter_cp850.c \
mbfilter_cp5022x.c mbfilter_sjis_open.c mbfilter_sjis_mobile.c \
mbfilter_sjis_mac.c \
mbfilter_iso2022jp_2004.c mbfilter_iso2022jp_mobile.c \
mbfilter_utf7.c mbfilter_utf7imap.c mbfilter_utf8.c mbfilter_utf8_mobile.c mbfilter_euc_jp_2004.c mbfilter_uuencode.c \
mbfilter_cp5022x.c mbfilter_sjis_mobile.c \
mbfilter_sjis_mac.c mbfilter_iso2022jp_2004.c mbfilter_iso2022jp_mobile.c \
mbfilter_tl_jisx0201_jisx0208.c", "mbstring");

ADD_SOURCES("ext/mbstring/libmbfl/mbfl", "mbfilter.c mbfilter_8bit.c \
Expand Down
Loading