Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] Support convert utf16 encoded string to utf8 string #1545

Open
chaokunyang opened this issue Apr 19, 2024 · 8 comments · May be fixed by #1561
Open

[Go] Support convert utf16 encoded string to utf8 string #1545

chaokunyang opened this issue Apr 19, 2024 · 8 comments · May be fixed by #1561
Assignees
Labels
enhancement New feature or request

Comments

@chaokunyang
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Currently Fury xlang serialization use utf8 for string encoding, which is not performance efficient in many languages.

We introduced utf16 in https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#string . But golang doesn't support utf16, we should support to transcode utf16 encoded string to utf8 string in fury go deserialization.

Describe the solution you'd like

Implement utf16 to utf8 transcoding in fury go. The implementation should use SIMD to provide faster speed.

Additional context

#1413

@chaokunyang chaokunyang added the enhancement New feature or request label Apr 19, 2024
@chaokunyang chaokunyang changed the title [Go] Support transcode utf16 encoded string to utf8 string [Go] Support convert utf16 encoded string to utf8 string Apr 19, 2024
@LiangliangSui
Copy link
Contributor

Hi @chaokunyang , Have you started implementing this feature? If it hasn't been implemented yet, I can take over and implement this.

@chaokunyang
Copy link
Collaborator Author

@LiangliangSui I haven't, feel free to take over it

@LiangliangSui
Copy link
Contributor

Okay, I will do this.

@LiangliangSui LiangliangSui self-assigned this Apr 24, 2024
@LiangliangSui LiangliangSui linked a pull request Apr 24, 2024 that will close this issue
2 tasks
@LiangliangSui
Copy link
Contributor

@chaokunyang We currently use UTF8 for cross-language serialization, and only Java(not cross-language) uses Latin/UTF16.

  public void writeString(MemoryBuffer buffer, String value) {
    if (isJava) {
      writeJavaString(buffer, value);
    } else {
      writeUTF8String(buffer, value);
    }
  }

Will we use UTF16 as the default cross-language String encoding in the future?

I see that the cross-language currently designed in fury_xlang_serialization_spec still uses UTF8 as the default.
image

@chaokunyang
Copy link
Collaborator Author

Depends on the language and the string. For golang, since the string is utf-8 encoded already. Fury go will encode data as utf8 string by a copy. But java/javascript/python may encode string as latin1 or utf16 and send to furygo. So we need to support utf16 too. And if the peer language, we may configure furygo use latin1/utf16 by default too.

@LiangliangSui
Copy link
Contributor

But java/javascript/python may encode string as latin1 or utf16 and send to furygo.

Latin1/UTF16 is only used in Language.JAVA and will not be sent to furygo.

@LiangliangSui
Copy link
Contributor

Okay, I got it.

@chaokunyang
Copy link
Collaborator Author

In the future, java/javascript/python may all encode string as latin1/utf16 and send to furygo.

@LiangliangSui LiangliangSui linked a pull request Apr 25, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants