It appears that the implementation of the UTF-8 encoding wasn't written to properly support UTF-16 surrogate pairs. Was this intentional? If you provide a surrogate pair to the api now you end up with 6 bytes instead of the expected 4 defined by UTF-8.
It will decode correctly using the corresponding decoding implementation but standard implementations of the UTF-8 decoders will not be able to properly decode it.