Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Output rfind() #175

Open
arshidkv12 opened this issue Jan 16, 2024 · 7 comments
Open

Incorrect Output rfind() #175

arshidkv12 opened this issue Jan 16, 2024 · 7 comments

Comments

@arshidkv12
Copy link

    let haystack_str = BStr::new("日本語テキストです。0123456789。E");
    let pos = haystack_str.rfind("。");
    println!("{}", pos.unwrap());

Output: 50
Expected output: 20

How to fix it?

@arshidkv12 arshidkv12 changed the title Incorrect Output rfind Incorrect Output rfind() Jan 16, 2024
@lopopolo
Copy link
Contributor

bstr the library universally operates in terms of byte offsets. if you'd like to count in terms of characters, use one of the iterators like ByteSlice::chars. Something like this:

let haystack_str = BStr::new("日本語テキストです。0123456789。E");
let pos = haystack_str.rfind("。").unwrap();
let s = &haystack_str[..pos];
let char_pos = s.chars().count();

@BurntSushi
Copy link
Owner

You need to explain why you expect the output to be 20. Because 50 looks correct.

While in not certain, you'll probably want to read this thread carefully: BurntSushi/aho-corasick#72

@BurntSushi
Copy link
Owner

if you'd like to count in terms of characters

And to be clear here, you almost never want this. Almost never. OP, if you describe the higher level problem you're trying to solve, we can probably guide you in the right direction.

@arshidkv12
Copy link
Author

arshidkv12 commented Jan 16, 2024

https://github.com/php/php-src/blob/master/ext/mbstring/tests/mb_strrpos_basic.phpt

Please check it

$needle1 = base64_decode('44CC');
var_dump(mb_strrpos($string_mb, $needle1));```

Output: 
-- Multibyte string 1 --
int(20)

@arshidkv12
Copy link
Author

bstr the library universally operates in terms of byte offsets. if you'd like to count in terms of characters, use one of the iterators like ByteSlice::chars. Something like this:

let haystack_str = BStr::new("日本語テキストです。0123456789。E");
let pos = haystack_str.rfind("。").unwrap();
let s = &haystack_str[..pos];
let char_pos = s.chars().count();

I think it is working. :)

@BurntSushi
Copy link
Owner

Read the thread I linked. Different environments can return different offsets depending on how the string type is represented. Seriously, read the thread I linked. It should answer everything.

@arshidkv12
Copy link
Author

Read the thread I linked. Different environments can return different offsets depending on how the string type is represented. Seriously, read the thread I linked. It should answer everything.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants