Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero width chars not showing in diff #10584

Open
aryzing opened this issue Oct 4, 2020 · 12 comments
Open

Zero width chars not showing in diff #10584

aryzing opened this issue Oct 4, 2020 · 12 comments

Comments

@aryzing
Copy link

aryzing commented Oct 4, 2020

🐛 Bug Report

Differences in strings containing zero width chars are not highlighted, making them incredibly hard to spot and debug. Eg, one of the following strings has as its first character (character at position 0) a byte order mark character as its first character, and the other has the letter "T" as its first character.

image

Additional useful info

To Reproduce

Just create two strings that look identical, one of which contains a character such as the BOM character.

Expected behavior

A clear indication of the difference.

envinfo

  System:
    OS: Linux 5.4 Ubuntu 18.04.5 LTS (Bionic Beaver)
    CPU: (8) x64 Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
  Binaries:
    Node: 14.4.0 - ~/.nvm/versions/node/v14.4.0/bin/node
    Yarn: 1.21.1 - /usr/bin/yarn
    npm: 6.14.5 - ~/.nvm/versions/node/v14.4.0/bin/npm
  npmPackages:
    jest: ^26.0.1 => 26.0.1 
@devil-cyber
Copy link

hey can I work on this issue

@SimenB
Copy link
Member

SimenB commented Oct 6, 2020

@devil-cyber sure, thanks!

@devil-cyber
Copy link

devil-cyber commented Oct 6, 2020 via email

@SimenB
Copy link
Member

SimenB commented Oct 6, 2020

This is really @pedrottimark's domain, but it should be somewhere in https://github.com/facebook/jest/tree/5da90b58eb6673e6f406f9ebc369b1f0fb747384/packages/jest-diff. Maybe @jeysal could provide some more details?

@devil-cyber
Copy link

devil-cyber commented Oct 6, 2020 via email

@jeysal
Copy link
Contributor

jeysal commented Oct 6, 2020

I would expect most of the changes to be in pretty-format more likely than jest-diff.
We want to serialize a string starting with certain characters in a different way.
There should probably be a brief discussion on how exactly this should be visualized, ideally with input from @aryzing as well.
Maybe "\uFEFFbla", but then how do we distinguish it from a string actually containing those characters (afaik backslashes are not an escape character in formatted strings)? Maybe move it before the "?

@devil-cyber
Copy link

devil-cyber commented Oct 6, 2020 via email

@aryzing
Copy link
Author

aryzing commented Oct 6, 2020

@jeysal on my side no strong preferences on display, just thought I'd bring this up as it might trip up others in the future. Perhaps there's appetite for visually separating parts of the string corresponding to invisible characters or whitespace characters other than the typical whitespace you get with the spacebar.

Maybe (fyi, 00A0 is no-break space)

Expected: "Rest of string"
Received: \uFEFF + "Rest of" + \u00A0 + "string"

Like I said, no strong feelings either way, I'm sure that any reasonable indication of the presence of such characters will be fine.

@github-actions
Copy link

This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 25, 2022
@aryzing
Copy link
Author

aryzing commented Mar 1, 2022

Believe this isn't resolved yet, is it?

@github-actions github-actions bot removed the Stale label Mar 1, 2022
@github-actions
Copy link

github-actions bot commented Mar 1, 2023

This issue is stale because it has been open for 1 year with no activity. Remove stale label or comment or this will be closed in 30 days.

@ibuibu
Copy link
Contributor

ibuibu commented Mar 11, 2023

I was just thinking about a solution for the issue.

Sorry I couldn't declare it before working on it.

I will try to create a PR as well.

I thought #13997 was a very good approach.

I thought the issue was that simply replacing a zero-width space with a no-break space makes it difficult to recognize that it is a zero-width space.
Also, there are multiple zero-width spaces other than FEFF, so I think this needs to be detected as well.

I took the approach of using Note to tell that a zero-width space exists. The disadvantages of this approach are that it is difficult to visually see where the zero-width space exists and that it is impossible to detect when there are multiple zero-width spaces.

I hope my PR will be of help and reference for a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants