Skip to content

Properly Handle UTF-8 Encoded Characters #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Properly Handle UTF-8 Encoded Characters #25

wants to merge 2 commits into from

Conversation

RLovelett
Copy link
Contributor

Based on the PHPWord CodePlex discussion boards from 2011 there is an issue regarding proper handeling of UTF-8 encoded strings. The issue seems to arise from an already encoded UTF-8 string being sent through the utf8_encode method.

Rather than just remove the utf8_encode method as proposed on the discussion board. My patch detects if the string is already UTF-8 encoded mb_detect_encoding($text) === 'UTF-8'. If the string is already properly encoded no additional steps are taken.

As far as I can test/tell, it is safe to remove the function call all together. However, there could be some edge case that I cannot see where this maybe needed.

Also included is an example case that outputs proper UTF-8 encoded strings.

RLovelett and others added 2 commits October 22, 2012 14:45
Modified implementation from discussion on PHPWord codeplex discussion boards.
See http://phpword.codeplex.com/discussions/261365 for more information.
@RLovelett
Copy link
Contributor Author

Oops looks like this is a duplicate of pull request #13 (which is much more complete).

@RLovelett RLovelett closed this Jul 16, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant