Duktape 3.0 release notes
Main changes in this release (see RELEASES.rst for full details):
- Switch to WTF-8 internal string representation. Matching surrogate pairs, both inside ECMAScript strings and in CESU-8 format provided by native code, are automatically combined to non-BMP UTF-8 codepoints; unpaired surrogates remain in CESU-8 style (= WTF-8). Invalid WTF-8 byte sequences are replaced with U+FFFD replacements to guarantee interned strings are now always valid WTF-8. ECMAScript code will still see standard ECMAScript strings where surrogates are individual characters. However, native code now sees standard UTF-8 strings whenever possible (and WTF-8 otherwise), simplifying integration with other UTF-8 based APIs.
- Switch to new internal object property table representation and new property handling code.
Upgrading from Duktape 2.x
- If your application deals with UTF-8 and/or CESU-8 string representations, the WTF-8 string representation may change or eliminate some conversion needs.
- Source variants have been removed from the distributable, with the
src/directory containing combined sources with default options and without
src-noline/). You can recreate the removed variants using
More details on WTF-8 changes
- WTF-8 sanitization, i.e. combining of paired surrogates and using U+FFFD replacements for invalid sequences, is applied during the string intern check which ensures all string data created by any means goes through the sanitization.
- WTF-8 sanitization uses U+FFFD replacements for invalid byte sequences. The replacement policy matches http://unicode.org/review/pr-121.html (same as TextDecoder).
- When native code pushes strings, non-BMP codepoints encoded in UTF-8 style (one combined codepoint) and CESU-8 style (surrogate pair encoded as two codepoints) now both result in UTF-8 style being used in the internal string representation. Previously one could push strings in either form and they would behave slightly differently for both native and ECMAScript code.
- ECMAScript now always sees standard strings consisting of 16-bit codepoints, extended codepoint support previously available to ECMAScript code has been removed.
- String.fromCharCode() now always uses standard ToUint16() coercion. The option
DUK_USE_NONSTD_STRING_FROMCHARCODE_32BIThas been removed (it allowed non-standard ToUint32() codepoint coercion). The standard function String.fromCodePoint() already allows the creation of non-BMP characters which now appear as UTF-8 to native code.