Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Both _selected_range and _replacement_range parameters are ignored in NSTextInputClient implementation. #3617

Open
ShikiSuen opened this issue Apr 1, 2024 · 16 comments
Labels
B - bug Dang, that shouldn't have happened DS - macos

Comments

@ShikiSuen
Copy link

ShikiSuen commented Apr 1, 2024

Description

TypeAlias "ICB" = "Inline Composition Buffer" = "Inline PreEdit"

Both _selected_range and _replacement_range parameters are ignored in NSTextInputClient implementation. This causes an issue that these parameters sent from IME to IMKTextInput APIs are completely neglected:

Some((preedit_string.len(), preedit_string.len()))

These parameters are crucial for some IMEs utilizing nested inline composition buffers (i.e. nested preedits). E.g. The Traditional Chinese Zhuyin IME shipped in macOS. In-ICB cursor is of vital necessity for these IMEs.

Note: These parameters are UTF16 ranges. Apple introduced these APIs in macOS 10.5 Leopard. See the IMKTextInput API header for more intelligence: https://github.com/phracker/MacOSX-SDKs/blob/041600eda65c6a668f66cb7d56b7d1da3e8bcc93/MacOSX10.5.sdk/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/Headers/IMKInputSession.h#L74-L85

I was about to send a PR to fix this issue. However, I have to figure out one thing:

In src/events.rs, search pub enum Ime and find Preedit(String, Option<(usize, usize)>),. I have doubts about this property: Are these cursor positions supposed to be utf8 or utf16? This has to be figured out prior to patching this issue.

image

macOS version

ProductName:		macOS
ProductVersion:		14.4.1
BuildVersion:		23E224

Winit version

Master Branch 44aabdd

@kchibisov
Copy link
Member

Everything is utf8, it's also documented like that.

@ShikiSuen
Copy link
Author

@kchibisov Thanks. Could you please explain why the documentation (in the screenshot above) says "The cursor positioin is byte-wise indexed"?

@kchibisov
Copy link
Member

It's an offset into utf8's byte string representation, because String is utf8.

@ShikiSuen
Copy link
Author

ShikiSuen commented Apr 1, 2024

It seems that currently available intels are enough. Here are the proposed steps:

  1. Use ObjC NSString APIs to cut the PreEdit text into (maximum) 3 component strings using the given NSRange parameters.

  2. Use the cut results and Rust string API to generate the new cursor position parameters for Ime::Preedit(A, Option<(B,C)>.

@ShikiSuen
Copy link
Author

I just wrote a utility function to convert utf16 NSRange to utf8 bounds:

// Convert given NSRange to Rust String Range data used in this crate.
unsafe fn get_actual_range_bounds(source: &NSString, range: NSRange) -> Option<(usize, usize)> {
    let source_bounds = NSRange::new(0, source.length());
    let lowerbound_u16: usize = range.location;
    let upperbound_u16: usize = range.length + lowerbound_u16;
    // Sanity Check.
    let mut should_return_nil = false;
    should_return_nil |= lowerbound_u16 > source_bounds.length;
    should_return_nil |= upperbound_u16 < lowerbound_u16;
    should_return_nil |= upperbound_u16 > source_bounds.length;
    if should_return_nil { return None }
    let u16_range_from_zero_a: NSRange = NSRange::new(0, lowerbound_u16);
    let u16_range_from_zero_b: NSRange = NSRange::new(0, upperbound_u16);
    let sub_string_a = unsafe { source.substringWithRange(u16_range_from_zero_a) };
    let sub_string_b = unsafe { source.substringWithRange(u16_range_from_zero_b) };
    let lowerbound_u8: usize = sub_string_a.to_string().len();
    let upperbound_u8: usize = sub_string_b.to_string().len();
    return Some((lowerbound_u8, upperbound_u8));
}

@ShikiSuen
Copy link
Author

A PR dedicated for _selected_range is here:
#3619

TODO: We need some Nihonjin IME devs' help regarding how to use _replacement_range correctly. The "_replacement_range" is at least used in Japanese IMEs. Therefore, Nihonjin IME devs are more familiar with Japanese IMEs on mac.

@kchibisov
Copy link
Member

Therefore, Nihonjin IME devs are more familiar with Japanese IMEs on mac.

You mean, 上手 ones? They are quite rare, at least in project I work with. In general you can just compare to other apps when doing the same input and figure out from it.

I can also try to take a look once I have time...

@ShikiSuen
Copy link
Author

@kchibisov

I forgot to say:

TypeAlias "Nihonjin" = "Japanese People"

Replacement Range in Japanese IMEs are used for reconversions. For example (Google Japanese Input for macOS):

https://github.com/google/mozc/blob/641260dbe918d2f6ddb3168cd43d8664ea08ca43/src/mac/GoogleJapaneseInputController.mm#L541

@kchibisov
Copy link
Member

Just fyi, I use mozc daily, though, maybe not that advanced yet.

Do you mean from the selected range? And what is the selected range in such case? I know conversions during the preedit, is it related to text around? Could you maybe make a gif of what you're talking about, so I can understand without looking too much into it yet.

@ShikiSuen
Copy link
Author

@kchibisov

That's why I said we need some Japanese IME developers (better Mozc ones) to help explain how the reconversion works.

I only have experiences developing Chinese IMEs. It seems that the reconversion feature is a shortcut to remove the needs of composition buffer. This makes sense since bare kana characters are also important elements in writing Japanese texts (mixing kana and kanji). However, written Chinese never mixes Pinyin / Zhuyin (except for dedicated purposes).

@ShikiSuen
Copy link
Author

@kchibisov Also, the reconversion feature needs a JIS keyboard. I don't have one.

JIS keyboard has a dedicated physical key for reconversion.
image

@kchibisov
Copy link
Member

Ah, I think I got what you mean. I'm pretty sure it works by indicating surrounding text around, so you can not really do that without adding a new API. Such API was discussed in the past, but that's about it.

@ShikiSuen maybe you need a special key on macOS, but on linux I just have a binding to convert between 漢字 and kana. Though, I'm not sure there's a way to trigger the convert just by starting selection, at least I've never seen how to.

@ShikiSuen
Copy link
Author

@kchibisov The replacementRange is used for interacting with the surrounding context.

@kchibisov
Copy link
Member

Yeah, so you need an API to indicate such range, and then delete send new text, that's pretty much how it's on Wayland. So this API could be left untouched for awhile until we have a separate Window::set_ime_surrounding_text or similar APIs. Though, I won't stop you from drafting one.

@ShikiSuen
Copy link
Author

@kchibisov For now we can just fix the use of _selected_range.

@kchibisov
Copy link
Member

Yeah, selected range matches the start/end cursor APIs we have pretty much, as what I can say, so shouldn't be that hard, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B - bug Dang, that shouldn't have happened DS - macos
Development

No branches or pull requests

2 participants