You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fromUTF8 requires the knowledge of the string length, while toUTF8 does not return that value. You have to call lengthUTF8 separately, which is wasteful (it's already called in toUTF8)
fromUTF8 handles \0 unicode characters in strings correctly, while toUTF8 gives an impression these are not supported
fromUTF8 requires pure size of encoded string in bytes, while lengthUTF8 returns size with zero byte padding. Even existing tests have to adjust for that explicitly:
This is an inefficient and confusing approach. If the goal of AssemblyScript is to be a high-level WASM-friendly language, then having C-isms in the standard library like naked pointers to null-terminated strings feels like going against those goals.
My suggestion would be to rename .lengthUTF8 and .toUTF8 to .lengthUTF8ZeroTerminated, .toUTF8ZeroTerminated and introduce .toUTF8Buffer which returns an ArrayBuffer populated with the correct content and size. This API will be far more clear and convenient for users.
The text was updated successfully, but these errors were encountered:
WASM is a low-level virtual machine, so it should be able to handle strings represented as binary arrays.
There are handy methods .fromUTF8 and .toUTF8: https://github.com/AssemblyScript/assemblyscript/blob/master/std/assembly/string.ts#L499
However, they are non-symmetrical in three ways:
fromUTF8
requires the knowledge of the string length, whiletoUTF8
does not return that value. You have to calllengthUTF8
separately, which is wasteful (it's already called intoUTF8
)fromUTF8
handles\0
unicode characters in strings correctly, whiletoUTF8
gives an impression these are not supportedfromUTF8
requires pure size of encoded string in bytes, whilelengthUTF8
returns size with zero byte padding. Even existing tests have to adjust for that explicitly:assemblyscript/tests/compiler/std/string-utf8.ts
Line 24 in b7e7be2
This is an inefficient and confusing approach. If the goal of AssemblyScript is to be a high-level WASM-friendly language, then having C-isms in the standard library like naked pointers to null-terminated strings feels like going against those goals.
My suggestion would be to rename .lengthUTF8 and .toUTF8 to .lengthUTF8ZeroTerminated, .toUTF8ZeroTerminated and introduce .toUTF8Buffer which returns an ArrayBuffer populated with the correct content and size. This API will be far more clear and convenient for users.
The text was updated successfully, but these errors were encountered: