Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document handling of non-UTF8 paths in TOC #1611

Open
aochagavia opened this issue Mar 14, 2024 · 0 comments
Open

Document handling of non-UTF8 paths in TOC #1611

aochagavia opened this issue Mar 14, 2024 · 0 comments

Comments

@aochagavia
Copy link

aochagavia commented Mar 14, 2024

When building the TOC, it is necessary to provide the full path of each entry, as mentioned in the docs:

- **`name`** *string*
This REQUIRED property contains the name of the tar entry.
This MUST be the complete path stored in the tar file.

However, in many systems the path is not guaranteed to be UTF8, and blindly including it here could result in invalid JSON (as defined in RFC 8259). For the sake of interoperable implementations of estargz, it would be useful to document what an implementation should do when creating an estargz layer that contains non-UTF8 file paths. The only options that come to my mind are:

  1. Creating non-compliant JSON anyway, assuming whoever loads the layer will be able to handle it.
  2. Using some form of escaping when encoding the paths, which get unescaped when decoding them.

Could anyone tell me what the current implementation does (I find it difficult to read the code, because I'm unfamiliar with Go)? Using that information I'd gladly come up with a PR later.

Note: this issue also applies to the link_name field of TOCEntry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant