Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] Implement meta string encoding algorithm for golang #1540

Open
chaokunyang opened this issue Apr 19, 2024 · 8 comments
Open

[Go] Implement meta string encoding algorithm for golang #1540

chaokunyang opened this issue Apr 19, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@chaokunyang
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

We've implemented meta string encoding algorithm in https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#meta-string for java in #1514 , it's time to implement it in golang.

Describe the solution you'd like

Java implementation in #1514 can be taken as a reference. But note that the meta string encoding algorithm is used for encode field name only, so the special charater can't be . or $, thus the implementation will be simpler

Additional context

#1413

@chaokunyang chaokunyang added enhancement New feature or request good first issue Good for newcomers labels Apr 19, 2024
@qingoba
Copy link
Contributor

qingoba commented Apr 20, 2024

Could you assign it to me? This is my first try of open source and I'm very interested in this task. Thanks.

@chaokunyang
Copy link
Collaborator Author

Great, thanks for the willingness to contribute to Fury

@qingoba
Copy link
Contributor

qingoba commented Apr 22, 2024

In function public MetaString encode(String input, Encoding encoding) in file MetaStringEncoder.java, there is a section of code:

default:
  byte[] bytes = input.getBytes(StandardCharsets.UTF_8);
  return new MetaString(
      input, Encoding.UTF_8, specialChar1, specialChar2, bytes, bytes.length * 8, 0);

why the numBits is 0, rather bytes.length * 8 ?
why the numChars is bytes.length * 8, rather bytes.length ?

@chaokunyang
Copy link
Collaborator Author

hmm, this is a bug, UTF-8 is barely used in meta string. Acutally, most chars are ascii chars. So it's not covered in Fury serialization tests. We need to fix it and add some unit tests.

Thanks for pointing out this bug @qingoba

@chaokunyang
Copy link
Collaborator Author

I have a new idea, we can add a bit to incidate whether strip last char in encoded meta string if the encoding is not UTF-8. In this way, we don't have to store num bits and num chars in MetaString

@qingoba
Copy link
Contributor

qingoba commented Apr 24, 2024

Exactly.
Because 5 + 5 > 8, in the last byte, there is at most one empty character.
Suppose we use empty to mark whether last char is empty, then the actual number of characters is equal to len(bytes) * 8 / 5 - empty

@qingoba
Copy link
Contributor

qingoba commented Apr 24, 2024

In this way, the Decoder does not need to accept numBits arguments.

@chaokunyang
Copy link
Collaborator Author

I have a new idea, we can add a bit to incidate whether strip last char in encoded meta string if the encoding is not UTF-8. In this way, we don't have to store num bits and num chars in MetaString

Hi @qingoba , I added stip last char flag to spec in #1565 . I believe this will make the implementation simpler

chaokunyang added a commit that referenced this issue Apr 24, 2024
## What does this PR do?

add strip flag in meta string encoding spec

## Related issues

#1540

## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants