-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Major overhaul of mbstring (part 1) #6052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
123 commits
Select commit
Hold shift + click to select a range
af8d6bf
Fix mbstring support for Shift-JIS
alexdowad ffb01c9
Add test suite for Shift-JIS encoding
alexdowad b96b1f4
Add identify filter for Shift-JIS-2004
alexdowad 714a7d1
SJIS-2004 encoding conversion: handle invalid (or truncated) 2nd byte…
alexdowad 9ece2b3
Don't mangle non-Japanese chars which appear after a 'combining' kana…
alexdowad 4bba852
Add test suite for SJIS-2004 encoding
alexdowad 0c0a4b8
Add identify filter for MacJapanese (variant of Shift-JIS)
alexdowad 55dcf81
SJIS-mac encoding conversion: handle invalid (or truncated) 2nd byte …
alexdowad 4feb701
Convert Unicode halfwidth Yen sign to MacJapanese halfwidth Yen sign
alexdowad 6cd472a
SJIS-mac encoding conversion: Stop the carnage of innocent Unicode co…
alexdowad 0a6e568
Add test suite for SJIS-mac encoding
alexdowad 86e927a
Enhance mbstring support for UCS-4 text
alexdowad 431c4b9
Leading BOM is stripped for UTF-32
alexdowad a38a4a0
Enhance mbstring support for UCS-2 text
alexdowad 675a311
Consolidate all single-byte encodings in one source file
alexdowad f6e834f
Fix identify filter for UTF-7
alexdowad 02fc58e
Minor code cleanup in mbfilter_utf7.c
alexdowad 0879484
Catch and handle errors in UTF-7 text conversion
alexdowad a825801
Add test suite for UTF-{7,8,16,32}
alexdowad 2c0098d
Add identify filter for modified UTF-7 (for IMAP protocol)
alexdowad 7cbe62b
Add test suite for mUTF-7 (UTF7-IMAP) encoding
alexdowad cbfc02f
Add identify filter for CP50220-raw
alexdowad 5d6ecd3
Add identify filter for 'HTML entities' encoding
alexdowad 8e9d740
Add identify filter for 'byte2be' and 'byte2le' encodings
alexdowad f3e730c
Add identify filter for 'byte4be' and 'byte4le' encodings
alexdowad fb9a1d9
Add identify filter for uuencode 'character encoding'
alexdowad 08a064b
Add identify filter for QPrint 'character encoding'
alexdowad 5550c71
Add identify filter for Base64 'character encoding'
alexdowad 32bb947
Stricter identification of valid strings in ISO-2022-JP-2004 encoding
alexdowad b374321
ISO-2022-JP-2004 conversion: handle invalid characters correctly
alexdowad 3dfc88f
Add test suite for ISO-2022-JP-2004 encoding
alexdowad 2e06352
Add comment explaining why ISO-2022-JP-2004, etc strings end with ESC…
alexdowad f3e3ee5
Stricter identification of valid strings in JIS7/JIS8 and ISO-2022-JP…
alexdowad 1d10932
JIS7/8 encoding: handle invalid 2nd byte for Kanji correctly
alexdowad 5321788
Add test suite for JIS7/JIS8 and ISO-2022-JP encodings
alexdowad 92dd38f
Update 'East Asian Width' table to comply with Unicode 13.0
alexdowad 360187e
Correct wrong flags for 'byte2be', 'byte2le', 'byte4be', and 'byte4le…
alexdowad 938b517
Enhance handling of EUC-JP text encoding
alexdowad eaca866
Add test suite for EUC-JP encoding
alexdowad ae0f229
Add identify filter for text encoding 'wchar'
alexdowad 215c146
All mbstring encodings have identify filter now
alexdowad 934a2d8
Tell compiler which way jumps will usually go in mbfl_memory_device.c
alexdowad 2c6db2c
Simplify code for handling mbstring language aliases
alexdowad 7874786
Major refactoring and optimization of mbfilter.c
alexdowad c0eeafd
Remove useless struct: mbfl_filt_tl_jisx0201_jisx0208_param
alexdowad 54e189f
Minor code cleanup in php_unicode.h
alexdowad 35a2cec
Remove unneeded function mbfl_convert_filter_copy from mbstring
alexdowad 163815a
Optimize mbstring upper/lowercasing: use fast path in more cases
alexdowad 9930b0c
Fix buggy support for negative 'start' argument to mb_strimwidth
alexdowad 809351c
Remove broken test for validity of 'start' argument to mb_strimwidth
alexdowad 05ad087
Remove unneeded function mbfl_filt_tl_jisx0201_jisx0208_init
alexdowad fab2151
Remove unneeded struct: mbfl_filt_conv_wchar_cp50220_ctx
alexdowad 20de301
mbfl_wchar_device_init accepts initial buffer size argument
alexdowad a5e5430
Remove unused COMPAT2 constants from mbstring
alexdowad db69a19
Rename 'COMPAT1' constants for mbstring to 'SPECIAL' (more meaningful…
alexdowad 48e390f
Remove unused MBFL_FILT_TL_{HAN2ZEN,ZEN2HAN}_MASK constants
alexdowad fd03267
Rename MBFL_FILT_TL_ZEN2HAN_{HIRA2KANA,KANA2HIRA} to accurately refle…
alexdowad 6a74bb4
Disallow nonsensical combinations of conversion flags to mb_convert_kana
alexdowad e612f38
Return values of 'filter flush' functions are never used
alexdowad 7cf60e0
Canonicalize flags received by mb_convert_kana
alexdowad d2d0c1c
Don't redundantly flush filter chain twice in mbfl_convert_filter_flush
alexdowad 1ab4f68
Remove unneeded 'filter_ctor' field from filter structs
alexdowad d88df51
Initialization of encoding identification filters is more concise
alexdowad a67cb8f
Remove unused 'score' field from mbfl_identify_filter struct
alexdowad e48660d
Rewrite UTF-8 -> wchar conversion to use function pointer as state va…
alexdowad 3c510fa
Optimize wchar -> JIS conversion (mbfl_filt_conv_wchar_jis)
alexdowad 4b53585
Optimize encoding of HTML decimal numeric entities
alexdowad 951001a
Offset and mask values in mb_encode_numericentity convmap do not affe…
alexdowad 8836e3c
Test strimwidth with UTF-16LE text
alexdowad 3a5c5e7
Test mb_strcut more thoroughly on UTF-16LE text
alexdowad 4269462
mb_convert_kana throws an error on unrecognized flags
alexdowad 4c19d61
Use mbfl_memory_device_output for emitting newlines in mb_send_mail
alexdowad 60483da
Code cleanup in mbfl_language.c
alexdowad f6447ea
Code cleanup in mb_str_split
alexdowad 4a74877
[WIP] Move meat of mb_str_split to mbfl_str_split (in mbfilter.c)
alexdowad e19f9b2
Minor cleanup in mbfilter_byte2.c
alexdowad 6223925
Minor cleanup in mbfilter_byte4.c
alexdowad 93158db
Minor cleanup in mbfilter_base64.c
alexdowad 0e01d8a
Minor cleanup in mbfilter_uuencode.c
alexdowad e9d4694
Code cleanup in mbfilter_htmlent.c
alexdowad bb5fb71
Code cleanup in mbfilter_qprint.c
alexdowad e39473d
Combine MBFL_ENCTYPE_WCS{2,4}{BE,LE} constants
alexdowad 03b57af
Nothing uses return values of identity filter functions
alexdowad b472672
Identify filter functions should take unsigned char, not int
alexdowad aeca9e6
Encoding conversion functions don't need to return anything
alexdowad fe8ad10
Optimize calls to mb_substr which return entire input string
alexdowad 1ce3045
Minor code tweaks in mbfl_convert.c
alexdowad cef8d9e
Add comment on assertion in mbstring.c which can fail
alexdowad 9274e2a
Add 'Windows-932' alias for CP932 text encoding
alexdowad b2a7ad4
Enhance handling of CP932 text encoding
alexdowad 2cee875
Add test suite for CP932 text encoding
alexdowad 490baf6
Don't pass invalid JIS X 0208 characters through
alexdowad 7100498
Don't pass invalid JIS X 0212 and Windows-CP932 characters through
alexdowad 5f0226a
Remove useless mbstring encoding 'CP50220-raw'
alexdowad 24267b4
Fix identify filter for CP50220 text encoding
alexdowad 99320df
Stricter handling of erroneous input when converting CP5022{0,1,2} te…
alexdowad 246bba7
CP5022{0,1,2}: convert characters in ku 0x2D (13th row) correctly
alexdowad 2393da2
CP5022{0,1,2}: convert Unicode codepoints in 'user' area (0xE000-E757…
alexdowad cfd3b56
Add test suite for CP5022{0,1,2}
alexdowad 73cac6a
Combine MBFL_ENCTYPE_MWC2{BE,LE} constants
alexdowad 9f5fee9
Remove useless mbstring encoding 'JIS-ms'
alexdowad 61981ad
Remove unused macros from mbfilter_cp51932.c, mbfilter_iso2022jp_mobi…
alexdowad 1bda08d
Enhance handling of CP51932 encoding
alexdowad b222b4d
Add test suite for CP51932 encoding
alexdowad 2540809
Remove unused function php_mb_safe_strrchr
alexdowad 8ef5543
Remove useless constant MBFL_ENCTYPE_MBCS
alexdowad 7e18f2f
Simplify php_mb_zend_encoding_converter
alexdowad c48d07c
Don't check for impossible error condition in mb_strlen
alexdowad c4cbaa7
Remove useless constant MBFL_ERROR_ENCODING
alexdowad a8d947c
Don't check for impossible error condition in mb_substr_count
alexdowad 951bc0d
Don't check for impossible error condition in mb_strwidth
alexdowad 0488bbb
Use stack-allocated buffer in php_mb_chr
alexdowad 2c53090
Remove unused function php_mb_mbchar_bytes
alexdowad 5234d92
No need to null-terminate buffer in php_mb_chr
alexdowad befbb46
Fix mbstring support for SJIS-Mobile (DoCoMo, KDDI, and Softbank vari…
alexdowad 347e14c
Add test suite for SJIS-Mobile
alexdowad 8da76e5
CP932: treat truncated multi-byte characters as an error
alexdowad 0534a1d
Remove duplicate implementation of CP932 from mbstring
alexdowad 83653b2
Fix mbstring support for ISO-2022-JP-KDDI encoding
alexdowad 6cef434
Add test suite for ISO-2022-JP-KDDI encoding
alexdowad 3534564
Fix mbstring support for ISO-2022-JP-MS encoding
alexdowad dadfc70
Add test suite for ISO-2022-JP-MS encoding
alexdowad 25ac652
Refactoring of UTF-8 with mobile vendor extensions (DoCoMo, KDDI, Sof…
alexdowad File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.