Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtitle track descriptions are garbled after upgrading vlcj from 4.7.2 to 4.8.2 #1198

Closed
tangshimin opened this issue Jul 13, 2023 · 16 comments
Labels

Comments

@tangshimin
Copy link

tangshimin commented Jul 13, 2023

Platforms: Windows 11

I'm from China, so the default encoding of my Windows is GBK. After I upgraded VLCJ from 4.7.2 to 4.8.2, the subtitle track descriptions are garbled. vlcj reads UTF-8 encoded Chinese as GBK.

I wrote a small demo to reproduce this bug, but when I tried to submit it, I found that vlcj-player's vlcj does not have garbled subtitle descriptions after upgrading to 4.8.2. After analyzing for a while, I found out that this project is using native-streams 1.0.0.

So I added native-streams 1.0.0 dependency to my project and it doesn't mess up, but if I upgrade native-streams to the latest 3.0.0, it still messes up. I tried native-streams 2.0.0 again, and 2.0.0 is working fine.

vlcj 4.7.2 subtitle description is fine
vlcj 4.8.2 will have garbled descriptions.
vlcj 4.8.2 + native-streams 1.0.0 subtitle description is normal
vlcj 4.8.2 + native-streams 2.0.0 subtitle description normal
vlcj 4.8.2 + native-streams 3.0.0 have garbled subtitle description

So my project now uses vlcj 4.8.2 + native-streams 2.0.0 after the upgrade.

@caprica
Copy link
Owner

caprica commented Jul 13, 2023

Interesting.

I can't think why that would make any difference, but anyway, using the latest versions of these libraries fixes the problem?

@tangshimin
Copy link
Author

tangshimin commented Jul 13, 2023

The latest version of native-streams cannot be used
I'm using vlcj-4.8.2 + native-streams-2.0.0 , and subtitle description works fine!

@caprica
Copy link
Owner

caprica commented Jul 13, 2023

OK, so you wrote this:

vlcj 4.8.2 + native-streams 3.0.0 does not have garbled subtitle descriptions.

You mean with native-stream 3.0.0 it does have garbled descriptions?

I'll try and take a look soon-ish.

@tangshimin
Copy link
Author

Sorry. I made a mistake.
it should be "vlcj 4.8.2 + native-streams 3.0.0 have garbled subtitle descriptions"

@caprica
Copy link
Owner

caprica commented Jan 26, 2024

Just having a look at this...

The native-streams project doesn't really do anything much, I can't understand how it can impact subs. It basically redirects stdout and/or stderr, that's it.

The only thing I can think is that including it in a project makes it use a different version of JNA. So the problem is maybe the dependence on a different version of JNA, nothing to do with the native-streams code itself.

@tangshimin
Copy link
Author

Just having a look at this...

The native-streams project doesn't really do anything much, I can't understand how it can impact subs. It basically redirects stdout and/or stderr, that's it.

The only thing I can think is that including it in a project makes it use a different version of JNA. So the problem is maybe the dependence on a different version of JNA, nothing to do with the native-streams code itself.

I tested it and it is indeed a JNA issue.
JNA 5.9.0 displays subtitle descriptions normally, JNA 5.10.0 ~ 5.14.0 subtitle descriptions are garbled.

@caprica
Copy link
Owner

caprica commented Jan 27, 2024

Very interesting, thanks for posting your follow-up.

@caprica
Copy link
Owner

caprica commented Jan 27, 2024

So there might still be something to look at in vlcj here, if something changed in relation to how JNA processes strings.

More investigation is needed.

@caprica
Copy link
Owner

caprica commented Jan 29, 2024

So if I were to try and replicate this issue, I just need a subtitle file with UNICODE characters? Or I specifically need one for your encoding?

Could you maybe point me to a subtitle file I can use to test tihs?

@tangshimin
Copy link
Author

To reproduce this problem you may need to set the system language to Chinese first.
Run vlcj player, open an mkv file with subtitles in several languages.
Then click on the subtitle menu, and then click on the subtitle track to reproduce the problem.

The demo video is Sintel
Displayed normally when the operating system language is English.
image




If the operating system language is Chinese and the version of JNA is equal to 5.9.0, it is displayed normally.
image




When the operating system language is Chinese and the version of JNA is greater than 5.9.0, Chinese characters are garbled。the reason may be that JNA uses GBK encoding to read UTF-8 encoded Chinese characters.
image

@tangshimin tangshimin changed the title Subtitle descriptions are garbled after upgrading vlcj from 4.7.2 to 4.8.2 Subtitle track descriptions are garbled after upgrading vlcj from 4.7.2 to 4.8.2 Jan 29, 2024
@caprica
Copy link
Owner

caprica commented Jan 29, 2024

Thanks a lot, I'll look into it.

@caprica
Copy link
Owner

caprica commented Jan 29, 2024

Could you try setting this environment variable:

jna.encoding=UTF8

@tangshimin
Copy link
Author

Could you try setting this environment variable:

jna.encoding=UTF8

It's still not working.

@caprica
Copy link
Owner

caprica commented Feb 2, 2024

Almost certainly this change is the cause of this issue:
java-native-access/jna#1393

@caprica
Copy link
Owner

caprica commented Feb 2, 2024

I posted this to a similar question at StackOverflow:

This is not a vlcj/native-streams issue specifically, rather it became a problem when the JNA dependency version that vlcj and native-streams use got bumped to a version newer than 5.9.0.

With JNA 5.9.0, the strings are not garbled, but switching to 5.10.0 (or later) will lead to garbled strings.

This change in JNA is a possible reason - "Update native encoding detection for JEP400", java-native-access/jna#1393

This is the code in JNA from 5.9.0, in Native.java:

    public static final Charset DEFAULT_CHARSET = Charset.defaultCharset();
    public static final String DEFAULT_ENCODING = Native.DEFAULT_CHARSET.name();

This is the code in JNA from 5.10.0:

static {
    // JNA used the defaultCharset to determine which encoding to use when
    // converting strings to native char*. The defaultCharset is set from
    // the system property file.encoding. Up to JDK 17 its value defaulted
    // to the system default encoding. From JDK 18 onwards its default value
    // changed to UTF-8.
    // JDK 18+ exposes the native encoding as the new system property
    // native.encoding, prior versions don't have that property and will
    // report NULL for it.
    // The algorithm is simple: If native.encoding is set, it will be used
    // else the original implementation of Charset#defaultCharset is used
    String nativeEncoding = System.getProperty("native.encoding");
    Charset nativeCharset = null;
    if (nativeEncoding != null) {
        try {
            nativeCharset = Charset.forName(nativeEncoding);
        } catch (Exception ex) {
        LOG.log(Level.WARNING, "Failed to get charset for native.encoding 
value : '" + nativeEncoding + "'", ex);
        }
    }
    if (nativeCharset == null) {
        nativeCharset = Charset.defaultCharset();
    }
    DEFAULT_CHARSET = nativeCharset;
    DEFAULT_ENCODING = nativeCharset.name();
}

You should check the value of the native.encoding environment variable when your program runs, and perhaps explicitly set the value to your platform native encoding rather than UTF-8.

@tangshimin
Copy link
Author

You should check the value of the native.encoding environment variable when your program runs, and perhaps explicitly set the value to your platform native encoding rather than UTF-8.

Thanks a lot
I checked native.encoding and native.encoding is GBK.

Then I set System.setProperty("native.encoding", "UTF-8") when I start the project. and the subtitle track descriptions don't get garbled.

@caprica caprica closed this as completed Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants