Properly Handle UTF-8 Encoded Characters #25
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on the PHPWord CodePlex discussion boards from 2011 there is an issue regarding proper handeling of UTF-8 encoded strings. The issue seems to arise from an already encoded UTF-8 string being sent through the
utf8_encode
method.Rather than just remove the
utf8_encode
method as proposed on the discussion board. My patch detects if the string is already UTF-8 encodedmb_detect_encoding($text) === 'UTF-8'
. If the string is already properly encoded no additional steps are taken.As far as I can test/tell, it is safe to remove the function call all together. However, there could be some edge case that I cannot see where this maybe needed.
Also included is an example case that outputs proper UTF-8 encoded strings.