Sending invalid Unicode via socket.io

Well you can try to, but it will end up almost probably in an disconnect which is caused by the browser.
As i have learned here.

Given you have a string which contains invalid unicode like:


This will trouble the browser and the socket connection.

If you prepare your json with PHP and  json_encode the Unicode will be escaped to some strings like these:

\ud83d\ude31\ud83d\ude31\ud83d\ude04\ud83d\ude04\ud83d\udc9c\ud83d\udc9c\ud83d\udc4a

But on clientside it will still result in invalid Unicode.

So after a lot of recherche i found this and used the decodeUnicodeString function from Zend to convert the escaped Unicode characters again to their unescaped representation. (dont forget the replacement as described in the post at stackoverflow if you extract it from Zend)
If the character is invalid it will be replaced by a question mark ‘?’.
This is at least what i asume.
The ? went through fine and the socket connection stays alive.

Ok the downside is you loose weird signs for an ‘?’, but i can live with that.
But im not really convinced with that solution, so if anybody knows another way to detect invalid Unicode sequences, let me know!

Leave a Reply

Your email address will not be published. Required fields are marked *