Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection with search and unicode #1686

Closed
jerch opened this issue Sep 13, 2018 · 29 comments · Fixed by #3236
Closed

Selection with search and unicode #1686

jerch opened this issue Sep 13, 2018 · 29 comments · Fixed by #3236

Comments

@jerch
Copy link
Member

jerch commented Sep 13, 2018

Combining, surrogate or fullwidth chars in the line and/or the search string lead to weird selection offset problems. Steps to repro:

  • insert into demo: echo -en 'combining: ééé\nfullwidth: ¥¥¥\nsurrogate: 𓂀𓂀𓂀\n'
  • search for 'ééé', '¥¥¥' and '𓂀𓂀𓂀'

The selection is kinda off for all 3 types, it gets even worse if the line contains any of these before their occurence. It seems the renderer and the selection manager do not agree on the chars widths and lengths.

Since I had a similar problem with the linkifier, it might be fixable the same way (#1678).

@jerch jerch added the type/bug Something is misbehaving label Sep 13, 2018
@Tyriar
Copy link
Member

Tyriar commented Sep 17, 2018

Seems to work fine for me on mac/master, let me know if you still see it.

@Tyriar Tyriar closed this as completed Sep 17, 2018
@jerch
Copy link
Member Author

jerch commented Sep 24, 2018

@Tyriar Nope its not gone, still the same here. Maybe its a platform issue?

Looks like this atm:
grafik
grafik

Looks like the accent char is accounted for 2 halfwidth chars by the selector, while the ¥ symbol gets treated as one halfwidth.

Found this in the code:

this._terminal._core.selectionManager.setSelection(result.col, result.row, result.term.length);

Imho the last argument should be the sum of wcwidth instead of the string length (not tested yet).

@jerch jerch reopened this Sep 24, 2018
@Tyriar
Copy link
Member

Tyriar commented Sep 24, 2018

@jerch are you on Linux?

@jerch
Copy link
Member Author

jerch commented Sep 24, 2018

Yes Ubuntu 16 here.

@Tyriar
Copy link
Member

Tyriar commented Sep 24, 2018

I guess we need to have a setting for this stuff like you were suggesting before. Still not sure the best way of querying the platform for these character widths though, I doubt we can rely on all Linux distros being the same and macOS being a different case.

@Tyriar
Copy link
Member

Tyriar commented Sep 24, 2018

Same underlying issue to #1059?

@jerch
Copy link
Member Author

jerch commented Sep 24, 2018

Nope, this time its not wcwidth's fault, changing the argument I mentioned above fixes the problems (tested a few minutes ago)

@jerch
Copy link
Member Author

jerch commented Sep 26, 2018

Currently blocked by #1707 and #1709.

@jerch
Copy link
Member Author

jerch commented Dec 16, 2018

Some background on this:
The way the start and end pos of the selection is determined still does not work for all surrogates and fullwidth chars combinations - thus if there are any of those in the line before the match or in the match itself start and end offsets can occur.

This can be fixed the same way I had to fix the linkifier underlining in #1769, by mapping a string index back to the buffer index:

const bufferIndex = this._terminal.buffer.stringIndexToBufferIndex(rowIndex, stringIndex, true);

If done twice (match start and end) the selection will correctly point to the underlying cells.

@Tyriar
Copy link
Member

Tyriar commented Oct 7, 2019

This still happens. This is the problem line:

terminal.select(result.col, result.row, result.term.length);

result.term.length for ééé is 6, the fix likely involves returning from _fineInLine an end row and col instead of the actual term.

@Tyriar Tyriar changed the title selection with search and unicode Selection with search and unicode Oct 7, 2019
@Silvyre
Copy link

Silvyre commented Oct 28, 2019

Hello, I would like to join my peer @miggs125 in contributing to xterm by tackling this issue.

I will first attempt to improve selection of strings that include diacritical marks.

@jerch
Copy link
Member Author

jerch commented Oct 29, 2019

@Silvyre Sure thing. Note that the terminal buffer already accounts diacritical characters into one cell with the main character, thus the issue comes from the string position to cell back-mapping.

@Silvyre
Copy link

Silvyre commented Oct 30, 2019

@jerch Thanks!

Note that the terminal buffer already accounts diacritical characters into one cell with the main character

Are you referring to the JoinedCellData type? As far as I can tell, this base type is not currently used within search selection (search selection appears to handle buffer cells as objects of IBufferCell type, which is not part of the ICellData hierarchy).

@Silvyre
Copy link

Silvyre commented Oct 30, 2019

changing the argument I mentioned above fixes the problems (tested a few minutes ago)

Starting to get on the same page. OK, modifying _findInLine to return getStringCellWidth(term) instead of term appears to improve the selection of diacritical characters, e.g. ééé (at least on Ubuntu 18.04; getStringCellWidth() calls wcwidth(), which may perform differently on other platforms?).

I can't imagine this to be a satisfactory solution, considering that, as you mentioned, this does not work for all surrogate/fullwidth character combinations [across various platforms] (e.g. selection of is still not great, at least on Ubuntu 18.04).

@Silvyre
Copy link

Silvyre commented Oct 30, 2019

selection of ¥ is still not great, at least on Ubuntu 18.04

To clarify, it sometimes works, as shown in this GIF, which I created after replacing every instance of line/term/cell.length with getStringCellWidth(...) in SearchAddon.ts. I'm going to try to tweak the find functions a bit more and see if I can improve behaviour that way.

@jerch
Copy link
Member Author

jerch commented Oct 30, 2019

@Silvyre Yes working with wcwidth correction is the right way to go here. Imho needed once for the search term itself (in case it contains weird chars) to get the amount of cells taken ("cell length"), then you'd need to correct every start offset found likewise to find the real cell offset. That cell-offset + term-cell-length % cols should give the real start and end position in the buffer.

@Silvyre
Copy link

Silvyre commented Oct 30, 2019

@jerch Excellent, I'll work on that. Thanks again!

@Silvyre
Copy link

Silvyre commented Oct 30, 2019

I have a general question regarding addons and dependencies: how are helper functions in src/common (e.g. getStringCellWidth, wcwidth from CharWidth.ts) imported into addons (e.g. addons/xterm-addon-search)?

@jerch
Copy link
Member Author

jerch commented Oct 30, 2019

@Silvyre They arent yet, the public API gets extended on request. Thus you'd have to go with internal refs for now. Maybe open an issue regarding this so we can decide how and where to put it.

@Silvyre
Copy link

Silvyre commented Oct 30, 2019

Sure thing, I'll open an issue.

@Silvyre
Copy link

Silvyre commented Oct 31, 2019

you'd need to correct every start offset found likewise to find the real cell offset

@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?

@Silvyre
Copy link

Silvyre commented Oct 31, 2019

I've also noticed that selectionEnd appears to spend most of its time undefined, while finalSelectionEnd gets defined. A related bug, maybe?

@Tyriar
Copy link
Member

Tyriar commented Oct 31, 2019

Maybe, it's meant to be undefined for various types of selection if I remember right though (word, line, select all).

@jerch
Copy link
Member Author

jerch commented Oct 31, 2019

@jerch I'm having a bit of a difficult time determining how and where cell offsetting should be (or is) implemented. Within BufferLine.ts?

Ah yepp thats abit hidden in the codebase, the code regarding this is in Buffer.ts and BufferLine.ts, both contain several methods that demostrate how to walk cells, easiest startpoint might be this:

public stringIndexToBufferIndex(lineIndex: number, stringIndex: number, trimRight: boolean = false): BufferIndex {

Not sure if you can directly use this method, you have to take care where your string index origin is (whether col 0 of wrapped or unwrapped lines).

@JasinYip
Copy link

image

I'm using VSCode(1.50.0) on macOS Catalina(10.15.6) and this issue still happening.

@JasinYip
Copy link

@Tyriar Hi, so the issue has any solutions? I tried to load xterm-addon-unicode11, it can only fix emoji chars viewing but searching for Chinese chars still having the issue.

@Tyriar
Copy link
Member

Tyriar commented Jan 19, 2021

Been a while since I looked at this code but I think we could expose the active IUnicodeVersionProvider's wcwidth to extensions via IUnicodeHandling.activeProvider.wcwidth or similar to solve this.

@JasinYip
Copy link

@Tyriar How could we fix or adapt this in our production? Seems it was internal code you write above and I have no idea how should I do...

@Tyriar
Copy link
Member

Tyriar commented Aug 13, 2021

@JasinYip there's some discussion about the fix in #3236, been a while since I looked and don't have time atm though.

@Tyriar Tyriar added this to the 4.16.0 milestone Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants