OpenSSL and Breaking UTF-8 Change (fixed in Node v0.8.27 and v0.10.29)
The Node.js Project
Today we are releasing new versions of Node:
First and foremost these releases address the current OpenSSL vulnerability CVE-2014-0224, for both 0.8 and 0.10 we've upgraded the version of the bundled OpenSSL to their fixed versions v1.0.0m and v1.0.1h respectively.
Additionally these releases address the fact that V8 UTF-8 encoding would allow
unmatched surrogate pairs. That is to say, previously you could construct a
valid JavaScript string (which are stored internally as UCS-2), pass it to a
Buffer
as UTF-8, send and consume that string in another process and it would
fail to interpret because the UTF-8 string was invalid.
Note, the results encoded by V8 in this case are exactly what was passed into the encoding routine. There is no overflow, underflow, or the inclusion of other arbitrary memory, merely an unmatched UTF-8 surrogate resulting in invalid UTF-8.
As of these releases, if you try and pass a string with an unmatched surrogate
pair, Node will replace that character with the unknown unicode character
(U+FFFD). To preserve the old behavior set the environment variable
NODE_INVALID_UTF8
to anything (even nothing). If the environment variable is
present at all it will revert to the old behavior.
This breaks backward compatibility for the specific reason that unsanitized
strings sent as a text payload for an RFC compliant WebSocket implementation
should result in the disconnection of the client. If the client attempts to
reconnect and receives another invalid payload it must disconnect again. If
there is no logic to handle the reconnection attempts, this may lead to a
denial of service attack. For instance socket.io
attempts to reconnect by
default.
// Prior to these releases:
new Buffer('ab\ud800cd', 'utf8');
// <Buffer 61 62 ed a0 80 63 64>
// After this release:
new Buffer('ab\ud800cd', 'utf8');
// <Buffer 61 62 ef bf bd 63 64>
// This is an explicit conversion to a Buffer, but the implicit
// .write('ab\ud800cd') also results in the same pattern
websocket.write(new Buffer('ab\ud800cd', 'utf8'));
// This would result in the client disconnecting.
Node's default encoding for strings is UTF-8
, so even if you're not
explicitly creating Buffer
s out of strings, Node may be doing so under the
hood. If what you're passing is not actually UTF-8
then when you call
.write(str)
you could be specific and say .write(str, 'binary')
which
signals Node to pass the string through without interpreting it.
You can also mitigate this in pure JavaScript by sanitizing your strings, as an example see node-unicode-sanitize which will similarly replace unmatched surrogate pairs with the unknown unicode character.
Thanks to Node.js alum Felix Geisendörfer for finding, getting the fixes upstreamed, and helping with the testing and mitigation. Also for helping to inform and improve the process for Node.js security issues.
To float these fixes in your own builds you can apply the following patch with
git am