Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weblate & icu format #2856

Closed
dialonso opened this issue Jul 4, 2019 · 28 comments · Fixed by #6582
Closed

weblate & icu format #2856

dialonso opened this issue Jul 4, 2019 · 28 comments · Fixed by #6582
Assignees
Labels
enhancement Adding or requesting a new feature.
Milestone

Comments

@dialonso
Copy link

dialonso commented Jul 4, 2019

Is your feature request related to a problem? Please describe.
Hello ! I would like to colorize the part of the text to be translated by translators because they are not able to know what should be translated
(e.g. {number, plural, =1{(# device)} other{(# devices)}} The translators had only to translate "device" and "devices" so I want thes two words to be colorized)

@nijel nijel added the enhancement Adding or requesting a new feature. label Jul 5, 2019
@nijel
Copy link
Member

nijel commented Jul 5, 2019

Weblate could definitely support this syntax and verify it.

On the other side please note that this is not the best way for handling plurals as it can not properly handle plural rules for many widely used languages, for example Russian or Arabic.

@dialonso
Copy link
Author

dialonso commented Jul 9, 2019

Thank you @nijel
I found it in the weblate doc (see screenshot1)
Nevertheless I can't configure it on my project, although having changed the flags by adapting it to "angularjs-format" (see scrrenshot2)
Do you have an idea ?
cap2
cap3

@nijel
Copy link
Member

nijel commented Jul 10, 2019

Probably because this is not a valid angularjs format string, we don't have support for ICU syntax right now.

@dialonso
Copy link
Author

sorry to insist but our json value string is well an angularjs format string Plural Syntax
The syntax for plural based message selection looks like the following:

extract from https://docs.angularjs.org/guide/i18n#plural-syntax
{{NUMERIC_EXPRESSION, plural,
=0 {MESSAGE_WHEN_VALUE_IS_0}
=1 {MESSAGE_WHEN_VALUE_IS_1}
=2 {MESSAGE_WHEN_VALUE_IS_2}
=3 {MESSAGE_WHEN_VALUE_IS_3}
...
zero {MESSAGE_WHEN_PLURAL_CATEGORY_IS_ZERO}
one {MESSAGE_WHEN_PLURAL_CATEGORY_IS_ONE}
two {MESSAGE_WHEN_PLURAL_CATEGORY_IS_TWO}
few {MESSAGE_WHEN_PLURAL_CATEGORY_IS_FEW}
many {MESSAGE_WHEN_PLURAL_CATEGORY_IS_MANY}
other {MESSAGE_WHEN_THERE_IS_NO_MATCH}
}}

our string is {{number, plural, =0{none} =1{(# device)} other{(# devices)}}}
the flag is set angularjs-format
1

and as you see in capture screen, no plural was detected, have i missed something ?
(we import as json nested because json i18next failed)

thx for your help

@nijel
Copy link
Member

nijel commented Jul 10, 2019

Yes, but I can still repeat: this is currently not supported.

What is supported is AngularJS interpolation string

@dialonso
Copy link
Author

Hello @nijel ,
Can you tell me please, in which file in the weblate code could I find the part of TextField where Translators have to enter the text and validate before clicking on « save » button
I intend to modify the code by adding a function that will verify if the ICU format is valid or not
If not valid, an error should be returned by the function when translator click on « save » button
Like on the screenshots below
Capture
Capture2

@nijel
Copy link
Member

nijel commented Jul 19, 2019

The existing AngularJS check is here: https://github.com/WeblateOrg/weblate/blob/master/weblate/checks/angularjs.py

This would deserve separate check with flag to enable it.

@jkuchar
Copy link

jkuchar commented Nov 1, 2019

What about reusing editor from https://format-message.github.io/icu-message-format-for-translators/editor.html ?

@radeklat
Copy link

radeklat commented Jun 29, 2020

We don't have support for ICU syntax right now

@nijel Is this the best issue to watch for checking for the supports? Or would be #2967 better? I started using Weblate for ARB files in Flutter and it uses the ICU format as the only option for plurals. #2967 makes it sound like a non-trivial implementation.

@nijel
Copy link
Member

nijel commented Jun 30, 2020

You can watch either. To big extent these are duplicates...

Some editor being capabale of syntax highlighting and linting should land as part of #4094, we can then start building the feature on top of that.

@theminecoder
Copy link

Is there any update to the progress of this? New to the platform however we are already heading deep into ICU/MessageFormat land

@nijel
Copy link
Member

nijel commented Feb 17, 2021

I don't think there has been any progress on ICU format support. Weblate is now using Prism for highlighting in the editor, adding ICU format to that should not be hard and could be a good starting point.

@rudzikdawid
Copy link

rudzikdawid commented Mar 2, 2021

PrismJS/prism#2675
merged PrismJS/prism#2745

@SirStendec
Copy link
Contributor

Hello,

I've been looking at using Weblate for the FrankerFaceZ project (browser extension for Twitch) and, as a result, getting ICU MessageFormat working within Weblate.

At this time, the Prism support for the format still hasn't landed in a release on npm, but it works to at least a basic level after pulling in the script from GitHub directly.

To that end, I've ended up:

  • Introducing a dummy "icu-message-format" flag for strings, until such time as I figure out how to write some decent server-sided checks.
  • Pulling in Prism head from GitHub
  • Including prism-icu-message-format.js in vendor
  • Modifying loader-bootstrap.js to introduce special logic for ICU MessageFormat.

Specifically, I've added special logic that changes how placeables work. Rather than being a root element, I moved them into the argument part of the highlighter as the name by itself doesn't matter and including the { and } seems counter-productive once you get to formats like plural.

I ended up not entirely satisfied with how that works out, though. I find it's easy to make a mistake when typing messages like that, especially if escaped characters are involved. To that end, I've looked into a proper parser that can live in the page and give live and direct feedback.

In the end, I ended up re-implementing the Online ICU Message Editor within the translation page. It indicates errors and allows translators to test strings with different values in real time.

It's very hacky at this point, but before attempting any sort of polish I thought I should reach out and see what other people think. In order to accomplish this editor I have:

  • Included format-message-parse and format-message-interpret as JS requirements.
  • Added a significant method to loader-bootstrap.js with logic for updating the live preview.
  • Introduced another section to the template in translate.html that is hidden by default, with the new JS unhiding and populating it when ICU MessageFormat is detected.

Having said all that, here are a pair of screenshots. One showing an error state and one not:

image

image

Let me emphasize again that this is a hacky implementation, currently only present in the front-end with no back-end checks, and I'm posting to ask about what people think of it.

As I'm looking at it now, I'm not sure if I'd prefer the live preview to be better integrated into the existing panel or not. It should also likely be made generic so that any other formats in the future can take advantage of it, such as Fluent. I'm also thinking that it might be better to not use Prism for highlighting these messages and instead do it manually if we're already parsing them into an AST for the preview and error checking.

This post is getting too long, though, so please let me know what you think of this direction. The code so far is at https://github.com/SirStendec/weblate/tree/icu-message-format if you want to see it but I shall emphasize again that this is still basically a big hack and not necessarily well structured or following code conventions.

Thanks.

@nijel
Copy link
Member

nijel commented Jun 22, 2021

Added a significant method to loader-bootstrap.js with logic for updating the live preview.

Should probably go to editor/full.js instead. Or even a separate js file which would be included only if the string has the appropriate flag.

Introduced another section to the template in translate.html that is hidden by default, with the new JS unhiding and populating it when ICU MessageFormat is detected.

Looks good. I think having it in the bottom tabs as "Live preview" is the best location. It will allow extending this to other formats as well.

I'm also thinking that it might be better to not use Prism for highlighting these messages

Prism is already used for highlighting other content from Weblate (like whitespace), so better to stick with it.

@SirStendec
Copy link
Contributor

SirStendec commented Jun 27, 2021

Should probably go to editor/full.js instead. Or even a separate js file which would be included only if the string has the appropriate flag.

A separate file definitely seems like the best route to me, since only a couple of flags will trigger the behavior.

Prism is already used for highlighting other content from Weblate (like whitespace), so better to stick with it.

Ah, yeah. I wasn't thinking about the whitespace highlighting and such. Prism it is.


To give a general update, I have spent some time to familiarize myself more with how the server application works and I've been implementing checks there.

That also involved writing a parser for ICU messages in Python, as the previously mentioned parser pyseeyou has issues. I effectively ported the parser from https://github.com/format-message/format-message to Python, with a couple tweaks to allow it to be a bit more permissive. Notably, format-message doesn't allow spaces in placeholder formats while they are valid as far as I can tell by the ICU spec. That new library is located at https://github.com/SirStendec/pyicumessageformat

I have ended up implementing two separate check flags. icu-message-format and icu-xml-format which handles simple <xml>style</xml> tags. A few libraries, including format-message, support the syntax so I thought I may as well. The code is largely re-used, with just a few bits toggled on or off depending on if tag support is desired.

Currently, all the checks are in one big class, and it handles:

  • Syntax errors
  • Missing placeholders
  • Placeholders with the wrong type
  • Unexpected placeholders that weren't in the source string
  • Placeholders with sub-messages but no "other" sub-message.
  • plural or selectordinal placeholders with an invalid key for a sub-message.
  • A placeholder that should be an XML tag / vice versa
  • A tag has content that was empty in the source
  • A tag is empty that had content in the source

I suppose it should be branched out to multiple classes, but I'm not sure the best way to handle that.

On the front-end side of things, I've:

  • Put all the Live Preview JS into its own file, static/editor/live_preview.js while intending that individual formats with Live Preview support would extend window.LivePreview.Parser with logic for parsing, rendering, and extracting type info.
  • In translate.html, created an optional second JS bundle sent when editing a unit with Live Preview support with JS for parsing ICU strings (mostly format-message with a bit of custom logic for extracting placeholder info from parsed ASTs).
  • Moved Live Preview to its own tab that's only generated when editing a unit with Live Preview support.
  • Added a method, in trans/views/edit.py for now, for determining if a unit supports Live Preview based on its flags. Not quite sure where to put it, so I just kind of tossed it there for the moment.

As for screenshots:
FrankerFaceZstrings_—German@_Devel_Weblate_Hel_2021-06-27_00-23-27

That's the editor in its current state looking at an icu-xml-format string. Note that there aren't any fields for changing the tag placeholders. Since tags are meant to either modify existing content, or fill in some other rich content, I thought it best to represent them visually in the output as you can see there.

FrankerFaceZstrings_—German@_Devel_Weblate_Hel_2021-06-27_00-24-22

If your string has an error, the Live Preview panel gets the panel-danger class and it displays the error message given by whichever library is handling the parsing / rendering / etc. Also, you'll note there's a badge on the Live Preview tab. It's a badge-danger to help an error stand out a bit if you're not on the Live Preview tab:

image

The code behind all of this is sitting in https://github.com/SirStendec/weblate/tree/icu-checks currently, but I don't think it's ready for a pull request yet. If nothing else, I would want PrismJS to cut a release first so there could be a dependency on a proper version. EDIT: PrismJS just released v1.24. Nice timing.

I also want to more thoroughly test everything. I've written a few tests so far, but nothing exhaustive yet.

@SirStendec
Copy link
Contributor

Since yesterday, I have:

  • Updated my branch to use the latest PrismJS release from npm
  • Increased test coverage for the ICU checks.
  • Moved the "should this have Live Preview?" logic from edit.py into unit.py
  • Added a check for select sub-message selector mismatches:

image

I'm still not satisfied with how the JavaScript for Live Preview is laid out, but it works well enough for now. Ideally, there'd be a proper place in the source tree for JS that gets run through build tools and a JS test runner.

Depending on what feedback I get on all the checks being in one class, I think this might be ready for an initial pull request?

@nijel
Copy link
Member

nijel commented Jun 30, 2021

Ideally, there'd be a proper place in the source tree for JS that gets run through build tools and a JS test runner.

Yes, that would be great, but a bit out of scope here. But contributions in this area are definitely welcome ;-). Right now, there is a custom build used for third-party code, see https://docs.weblate.org/en/latest/contributing/frontend.html

Depending on what feedback I get on all the checks being in one class, I think this might be ready for an initial pull request?

Having it all in a single class is okay. In the past, the checks had to be more fine-grained as they could not provide additional feedback to the user, but with the current API, it makes more sense to have a single check that would tell what is wrong in the description.

@SirStendec
Copy link
Contributor

Having it all in a single class is okay. In the past, the checks had to be more fine-grained as they could not provide additional feedback to the user, but with the current API, it makes more sense to have a single check that would tell what is wrong in the description.

Alright, sounds good to me. I've been adding a few extra things that I thought of while setting up my project's new i18n pipeline. On the server side of things I've got:

  • Flags to disable specific sub-checks, such as for example disabling the sub-check that requires an "other" sub-message when sub-messages are present.
  • A flag with all the valid type names, for validating placeholder types better.

For Live Preview specifically, I was thinking about helping to pre-populate the fields with example values for translators and I came up with the idea of an lp-defaults flag. The flag has a string parameter that expects JSON which is used to pre-populate the Live Preview fields, and there can be multiple presets. I think this GIF shows it off better:

2021-07-01_20-50-52

I'm just wondering how much sense this makes to other people. My own i18n layer detects new/changed strings at runtime during development and automatically captures placeholder values, so I can easily dump sample data in for every unit. I also haven't looked into that backend that hard, so I'm not clear about, for example, the maximum length of a flag string and if storing some JSON would present an issue there.


Also, I want to improve highlight extraction so it stops grabbing entire placeholders with sub-messages. It should definitely not pull in anything with sub-messages. Just haven't gotten around to that yet. Been a bit busy with other work projects.

@nijel
Copy link
Member

nijel commented Jul 21, 2021

I'm not sure about using JSON here - it is easy to get it wrong when editing manually. But I don't see a better choice here... The length is not an issue for flags.

@orangesunny
Copy link
Member

Hi @SirStendec,

Do you have any progress with this? Do you need any help?

@SirStendec
Copy link
Contributor

Hi @orangesunny,

I ended up busy with some other projects so I had to put this on hold for a bit. I should be able to finish things up soon. I am basically done with what I wanted to accomplish, though I'm not entirely happy with the Live Preview that is mostly to do with how JavaScript assets are structured in weblate.

The code in https://github.com/SirStendec/weblate/tree/icu-checks is current with my copy, so you can grab that if you want to test it yourself.

It's probably ready for a PR, but ideally I'd like some more ICU formatted strings from other projects to test it with to help make sure everything's working well and that there's no weird input case I'm not handling or catching with unit tests.

@nijel
Copy link
Member

nijel commented Sep 10, 2021

how JavaScript assets are structured in weblate

There are definitely better ways to do that, but somebody with the expertise would have to redesign this and so far, most contributions to Weblate comes from the backend devs... The current status is documented at https://docs.weblate.org/en/latest/contributing/frontend.html

Anyway, you are welcome to open a pull request to get a review of that code.

@SirStendec
Copy link
Contributor

I was able to better test the ICU checks recently thanks to another project that has just over 350,000 strings. (My own project is only around 10,000.)

As such, I'm feeling much more confident about the state of the server-side code and I'll be putting in a pull request for that soon. Possibly tomorrow if my schedule remains clear.


The Live Preview feature will be coming in a second pull request at a later time. I've got a good general idea how I want to handle its internals now... because I went ahead and implemented support for Fluent into it as well, in celebration of beta support for Fluent documents landing in Weblate. Having two supported formats makes it more obvious where the shortcomings are.

@HcgRandon
Copy link

Eagerly awaiting this PR for many years. I can also offer some testing but I don't have a lot of strings atm to test with.

@nijel nijel linked a pull request Oct 27, 2021 that will close this issue
5 tasks
@nijel nijel added this to the 4.9 milestone Oct 27, 2021
Translation editor automation moved this from To do to Done Oct 27, 2021
@github-actions
Copy link

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

@DjLeChuck
Copy link

Hi,

Sorry for the bump of a closed issue, but I would like to know if the LivePreview component is always on your sight @SirStendec? :)

It's exactly what we are looking for the manage translations of ours projects with Symfony.

If you need help, don't hesitate to ask!

@ValentinFrancois
Copy link

Same here, the Live Preview feature is a game changer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adding or requesting a new feature.
Projects
Development

Successfully merging a pull request may close this issue.