Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gojq -nr '["\u0000"] | @tsv' #200

Closed
pkoppstein opened this issue Nov 13, 2022 · 6 comments
Closed

gojq -nr '["\u0000"] | @tsv' #200

pkoppstein opened this issue Nov 13, 2022 · 6 comments

Comments

@pkoppstein
Copy link

Compare:

$ jq -nr '["\u0000"] | @tsv'
\0

and the output produced by gojq:

$ gojq -nr '["\u0000"] | @tsv'
�
# In the line immediately above, a single literal NUL should appear.

In 2015, there was discussion about what jq should do with NUL and
even though it may not be documented in the manual, the decision
to present NUL as \0 by @TSV was part of the discussion at
jqlang/jq#759

@itchyny
Copy link
Owner

itchyny commented Nov 13, 2022

Please let me know if any application that recognize \0 for NUL characters in tsv or csv. As far as I opened the output by Numbers, the gojq output is correctly recognized and \0 is recognized as \0.

@pkoppstein
Copy link
Author

pkoppstein commented Nov 13, 2022

That's not quite the right question. The point is that a TSV file should be processable by tools that are NOT well-disposed to NULs:

$ /usr/bin/awk --version
awk version 20200816
$ /usr/bin/awk  -F\\t '{print $2}' null.tsv
b

$ cat null.tsv
a	b
c�	d
$ 

@itchyny
Copy link
Owner

itchyny commented Nov 14, 2022

So, there is no way to deal with NUL characters properly, and we should not use it in the input for the filters. Rather than making it indistinguishable against "\\\0", I prefer to keep it as it is to notify the user the wrong place in the input data.

@itchyny
Copy link
Owner

itchyny commented Nov 14, 2022

Hmm, I'd rather reject them by errors.

@pkoppstein
Copy link
Author

@itchyny wrote:

there is no way to deal with NUL characters properly

But, when generating a TSV file, special characters (notably tabs, linefeeds, and returns) must be handled sensibly, and I believe, the same logic applies to NULs as well. Agreed, there is no "TSV standard" that mandates allowing raw NULs in a TSV file, but (a) there are plenty of applications which recognize the two-character sequences '\t', '\r', '\r' and '\0' as stand-ins; and (b) in the present context, jq itself establishes a time-tested and reasonable standard.

@itchyny
Copy link
Owner

itchyny commented Nov 14, 2022

Okay, I noticed that the escaped character is escaped in @tsv so that's fine. I'm worried about the behavior of @csv and @sh that they yield the same string against different strings (["\u0000", "\\0"] | @tsv,@csv,@sh,map(explode)). But after a few minute research, it seems that dealing with the null characters in CSV and shells are very difficult. After all, most people are happy with the tool just work against their real data, and few people are care about the behavior against the null characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants