Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce LazyLoad Backend #3

Closed
wants to merge 8 commits into from
Closed

Conversation

paarthmadan
Copy link

@paarthmadan paarthmadan commented Jan 19, 2022

Note: I'm using this PR to collect feedback internally. I'll close it once we're aligned and propose this upstream.

What's in this PR

This PR introduces a new LazyLoad backend following this discussion in ruby-i18n#592.

What does the LazyLoad Backend offer?

The Simple backend can't infer which files belong to which locale, so it loads all files in the load path and resolves the locale by inspecting the translations that are loaded. This is a fool proof strategy, but it comes at the expense of needing to load all files for all locales, for any arbitrary locale.

This backend avoids the cost of loading unnecessary translation files by carefully selecting only those files which are needed for the current locale. It lazily initializes translations on a per locale basis.

How does the LazyLoad Backend work?

This backend trades off the expensive cost of I/O with the cost of perform string matching on files in the load path. It makes assumptions about which files belong to a locale and selectively loads only these files.

How does the LazyLoad Backend know which files belong to which locale?

It makes assumptions about how files are named. Clients must abide by this naming system if they decide to use this backend.

The heuristic used to bind a file to its locale can be defined as follows:

  1. the filename is in the I18n load path
  2. the filename ends in a supported extension (ie. .yml, .json, .po, .rb)
  3. the filename starts with the locale identifier (ex: "translations/en_001.yml")

When should someone use this backend?

Workloads that operate in the context of a single locale at a time and have many translations files for many locales. For instance, a large Rails workload would benefit from this backend.

It's designed for test environments, not environments where eager loading is preferred.

Benchmarks: Comparing the Simple backend to the LazyLoad backend

A benchmark setup was used to compare the performance of these two backends.

Table 1: Setup with 10 files per locale, 100 keys in each file:

Backend Work Performed User Sys Total Real
Simple Eager load (:en) 0.012764 0.000721 0.013485 0.013503
Simple 3 Eager loads (:en, :fr, :de) 0.012364 0.000675 0.013039 0.013038
LazyLoad Eager load (:en) 0.004820 0.000330 0.005150 0.005137
LazyLoad 3 Eager loads (:en, :fr, :de) 0.019816 0.000847 0.020663 0.020674

Table 2: Setup with 100 files per locale, 1000 keys in each file:

Backend Work Performed User Sys Total Real
Simple Eager load (:en) 1.342190 0.020641 1.362831 1.363569
Simple 3 Eager loads (:en, :fr, :de) 1.344860 0.018035 1.362895 1.363284
LazyLoad Eager load (:en) 0.478600 0.011205 0.489805 0.489951
LazyLoad 3 Eager loads (:en, :fr, :de) 1.357584 0.026064 1.383648 1.384148

Exploring the results

The Simple backend works for the same amount of time in the case when it needs to load translations for a single locale, and when it loads translations for all locales. This makes sense as the backend loads all translations irrespective of the current locale.

The LazyLoad backend reduces working time as it avoids loading unnecessary files. In the case when loading for a single locale, we see that the LazyLoad backend outperforms Simple, 0.005 vs 0.013 in Table 1 and 0.4899 vs 1.363 in Table 2.

The LazyLoad backend performs roughly on-par with the Simple backend when it needs to load all translations. There is additional overhead of string matching which brings down the performance in small workloads. It's negligible in any significant workloads compared to the time spent in I/O.

Remarks

This backend is designed to bring performance improvements to workloads with a large volume of locales, translation files, and translation keys.

Performance isn't guaranteed for all applications, which is why the backend is designed to be opt-in.

At Shopify, we've patched ruby-i18n locally to implement a similar strategy. We've observed close to 10x speed ups locally in specific tests and roughly 20% speeds across the suite.

@@ -0,0 +1,61 @@
require 'test_helper'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a temporary test that's used to setup the benchmarks. It'll be presented upstream, but I don't intend on merging this.

@paarthmadan
Copy link
Author

Please review @Shopify/rails

cc: @adrianna-chang-shopify, @shioyama

Copy link

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work, Paarth! 👏 🚀

As discussed IRL: #available_locales doesn't work right now, because it will only look at loaded locales, which is likely to be incomplete. I think we'll need to do something similar to what @shioyama did in Core and use our selection heuristics to grab available locales from the load path.

A couple of points we talked about IRL that are not blockers for this PR, but that we might want to think about in terms of delivering this as a feature upstream. I've writing them down here in case anyone else has thoughts / feedback on them, and to remind myself what was discussed 😄

  1. We might want to ensure that this can work out of the box with Rails. There are default translations in Rails that don't conform to the path-matching regex we've specified. Maybe this means additional support in https://github.com/svenfuchs/rails-i18n to ensure that these translations are always loaded by default, similar to what we did in Core.
  2. Should we have some sort of sanity check that apps can use to verify their translations are set up correctly? Might be nicer than failing silently if a translation has an unusual file path and doesn't get picked up properly. Possibly a rake task?

I don't have sufficient context on the failing JRuby tests -- maybe someone else from the team can step in there.

One more question: are we intending that folks use this in production? Obviously the use case we have in Core is strictly for our test environment. Prod is eager loaded so this is a no-go 🤦‍♀️

I think our next step is to try to adopt this in Core, and make sure all the tests continue to pass as expected.

test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved
test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved
test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved
test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved
Copy link
Member

@shioyama shioyama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 👍 Just some initial comments, mainly about the format conventions for locale filenames.

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved
@casperisfine
Copy link

When should someone use this backend?

I believe you should make it clear in that section that this backend is for development and test environment, and shouldn't be used in production environments.

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved
test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved
test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved
@paarthmadan paarthmadan force-pushed the pm/lazy-load-backend branch 4 times, most recently from 8a801a9 to 74091d5 Compare January 21, 2022 20:09
@paarthmadan
Copy link
Author

Thanks for the reviews, I incorporated all stylistic changes, nits, and minor feedback.

I believe you should make it clear in that section that this backend is for development and test environment, and shouldn't be used in production environments.

I agree, I've updated documentation to make this more clear, and I've changed the behaviour of eager_load! to solidify that this shouldn't be used in envs where eager loading is the best practice.

@paarthmadan
Copy link
Author

The crux of this problem, now, seems to be which naming format the backend assumes and how we can generate available locales from this.

I'd like to collect your feedback on some approaches for how to proceed.

Here's a small set of criteria I've used to evaluate the approaches:

  1. The solution should be generalizable. In other words, how many apps actually abide by this format? Could a simple Rails app immediately make use of the new backend?
  2. How difficult is it to use the solution in core?
  3. How much overhead is introduced with this solution?

With this criteria in mind, here are some approaches Adrianna and I discussed

Approach 1: Assume all files abide by <locale>-translation.yml format (ie. locale at the start of the file)

Advantages:

  • This would make the #available_locales implementation straightforward
  • The constraint on the naming format is small, so backwards compatibility isn't severed (if we start too large, it'll be hard to move back)

Disadvantages:

  • Solution isn't necessarily generalizable. (ie. many apps use paths like /path/en/views/file.yml)
  • Usage in core isn't trivial, because not all locale files in core abide by this rule.

Approach 2: Assume all files abide by <locale>-translation.yml format OR /path/<locale>/translation.yml (ie. locale at the start of the file OR identifier in the path)

Advantages:

  • The constraint on the naming format is still relatively small, so backwards compatibility isn't severed

Disadvantages:

  • Solution is more generalizable but as highlighted by this comment supporting paths like these make it hard to extract the locales)
  • #available_locales implementation is virtually impossible without loading all translations or making hasty generalizations

Approach 3: Assume all files are formatted in a specified way and introduce configuration for "path to locale" resolution

This is the same solution as the two above, but now we create an interface that allows clients to specify how paths are related to their locale.

Imagine a proc in a configuration file such as, or something similar:

# Example: /config/initializers/i18n.rb
I18n.lazy_load.path_resolve = ->(path) do
  extract_locale(path) # Example Implementation 1
  LookupTable[path]    # Example implementation 2
  # ... etc
end

Advantages:

  • This helps generalization, we give the client the interface to easily work around edge cases such as files that don't abide by the naming convention.
  • Enables usage in core

Disadvantages:

  • Onboarding is very involved. Have to write custom logic per app to configure load path selection heuristic.
  • #available_locales will need to be left to client, which will still be tricky.

Approach 4: Generate / Dump mapping between locale file and locales (use this instead of string matching)

This approach varies the most from what's already presented.

At a high-level:

  1. Generate static mapping by loading all translations and relating a locale to the files that it came from.
  2. This dump can be enforced in the CI build, the mapping will need to be checked into version control
  3. Locally, the LazyLoad backend will load this mapping into memory and consult the lookup table to deduce which files need to be loaded. Instead of searching through load path, we consult the table. #available_locales are simply the keys of the table.
  4. Locally, when new translation files are introduced or a translation files change, a re-dump will be required. I believe this can handled more intelligently (ie. If a file is modified, load it regardless of the locale)

Note: The Simple backend loads all translations to determine available locales anyways. We would have to pay this cost once, preferably as part of the build process. If we can define a way to resolve this mapping when local changes occur, then this solution seems promising (ie. using a digest for modified files, or always including translations that are dirty in the context of the git branch)

Advantages:

  • The most generalizable. It doesn't define a specific format, or require a specific hierarchy
  • Doesn't require client side configuration
  • Easily works with Core

Disadvantages:

  • Produces an artifact that must be persisted
  • Mapping will need to be updated locally (or local changes will need to be handled differently). If this can't be done without reloading all translations locally, then it defeats the purpose.

The list certainly isn't exclusive, nor are any of the solutions well-defined. I wanted to collect feedback early to see if anyone had any strong opinions or had some ideas they'd like to share.

cc: @shioyama, @casperisfine, @adrianna-chang-shopify

Ultimately, this problem is tricky because we don't know which files belong to a locale without loading them. The complexity rises in trying to do this without actually doing it 😄

@casperisfine
Copy link

casperisfine commented Jan 24, 2022

which naming format the backend assumes and how we can generate available locales from this.

While it's the most accurate, I don't think the mapping solution has good ergonomics.

I think we should either enforce a naming convention and / or have a list of regexp or other type of callbacks to extract a locale from a path. This means that we should have some kind of strict mode if the assumptions end up wrong, and they should be tested via loading all tests translations.

@adrianna-chang-shopify
Copy link

adrianna-chang-shopify commented Jan 24, 2022

I'd lean towards approach 3. As Jean said, option 4 guarantees accuracy but is a bit tedious and involved for users to set up. I think that especially because this is a test environment optimization, it should be as simple as possible for users to set up / opt into. Going with 3 allows us to start very simple and assume that 95% of apps can comply with those expectations out of the box (I think we should start with the assumption that "all files abide by <locale>-translation.yml format"). We can then figure out what an ideal API looks like as we extend this to Core and figure out how to make it work there.

Disadvantages:

  • Onboarding is very involved. Have to write custom logic per app to configure load path selection heuristic.
  • #available_locales will need to be left to client, which will still be tricky.

I'd actually argue that this is not so involved. For most applications, going with the default format should be enough. We can be pretty explicit about we we expect from the "path resolve" feature -- as Jean suggested, it could be as simple as providing a regex / list of regexes that extracts the locale from the filepath. Users wouldn't necessarily need to write #available_locales themselves -- we should just be able to go through the files in the load path and extract the locales that meet the patterns. The biggest thing to watch out for here is matching locales that aren't really locales, but maybe we could offer a script / CLI command that "sanity checks" the app's translation setup and ensures that the list of locales provided by #available_locales is equivalent to the set of locale keys generated by reading all of the translation files.

TL;DR - I think we can keep things simple for now and be rigid about the naming configuration we expect, and figure out a way to offer configuration as a follow-up.

@paarthmadan
Copy link
Author

After syncing with Jean and Adrianna, an approach to get a first iteration and possibly integrate with Core in the future:

  1. Use simplest locale format, with either a) demarcation or b) tracking loaded_paths
  2. Implement available_locales
  3. Implement varied behaviour with configured mode (ie. eager vs. lazy)
  4. Propose PR upstream to ruby-i18n

If the PR is well-received, the plan for Core / Rails can be as simple as:

  1. Moving all locale files to abide by new format by renaming / splitting files. This can be done with a script that introspects the load path, checks if the file abides by the convention, and if not, renames and splits the file appropriately.

The amount of work remaining to propose this upstream as defined above is limited. The calendar time for reviews may be longer, but if the PR does make it upstream, the Core work can also be done at any time in the future. It should take a few days at most, mostly scripting and testing.

Copy link

@casperisfine casperisfine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor stuff, but looks like it's on the right track, and it really isn't that much code.

lib/i18n/exceptions.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved
@paarthmadan paarthmadan force-pushed the pm/lazy-load-backend branch 2 times, most recently from 5841e5b to 996965f Compare January 31, 2022 19:24
Copy link

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, Paarth! I really appreciate the amount of detail you've put into test cases and documentation ❤️ I have a bunch of minor comments (feel free to contest any suggestions I've made that don't make sense 😆 ), and I'd like to give it a tophat on a simple Rails app, but this looks pretty much ready to be proposed upstream IMO 🚀

lib/i18n/backend/base.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved
lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved
lib/i18n/tests/basics.rb Outdated Show resolved Hide resolved
lib/i18n/tests/basics.rb Outdated Show resolved Hide resolved
lib/i18n/exceptions.rb Outdated Show resolved Hide resolved
Comment on lines 152 to 171
def file_named_correctly?(path, translations)
locales = translations.keys.map(&:to_sym)
return false unless locales.one?

LocaleExtractor.locale_from_path(path) == locales.first
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than return true/false this method should raise a final error. This way we can include in the error message which assumption was broken, and in which file, making it much easier to fix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the fence about which way to go. I agree that knowing which assumption was broken will help, and this is something I'll add with the current approach.

The reason why I decided to raise with all the offending files at once is mainly a DX concern. If someone's onboarding using this strategy and would like to acquire a complete list of offending files, it would be handy to raise this in a single error.

My worry is that people will fight with repeated errors being raised and only able to fix them on a one-by-one basis.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, collecting all the offenses and raising once is even better. But I think you can do both, it just can't be expressed by simple booleans since you have multiple types of offense (e.g. path not matching at all, and content not respecting what the path claims).

So maybe you can collect a list of error message in an array and then raise if it's not empty.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reached the same conclusion when pairing on this yesterday! 😄 I believe Paarth has submitted the changes upstream now: https://github.com/ruby-i18n/i18n/pull/612/files#diff-8e52202d93e7810223b33ab64de65635e73f04ffa8d7d38b14b5e656039d3ea1R152-R162

@paarthmadan paarthmadan force-pushed the pm/lazy-load-backend branch 4 times, most recently from 50765ec to 05605be Compare February 2, 2022 23:38
@paarthmadan
Copy link
Author

Closing in favour of ruby-i18n#612

@paarthmadan paarthmadan closed this Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants