Introduce LazyLoad Backend #3

paarthmadan · 2022-01-19T18:51:19Z

Note: I'm using this PR to collect feedback internally. I'll close it once we're aligned and propose this upstream.

What's in this PR

This PR introduces a new LazyLoad backend following this discussion in ruby-i18n#592.

What does the `LazyLoad` Backend offer?

The Simple backend can't infer which files belong to which locale, so it loads all files in the load path and resolves the locale by inspecting the translations that are loaded. This is a fool proof strategy, but it comes at the expense of needing to load all files for all locales, for any arbitrary locale.

This backend avoids the cost of loading unnecessary translation files by carefully selecting only those files which are needed for the current locale. It lazily initializes translations on a per locale basis.

How does the `LazyLoad` Backend work?

This backend trades off the expensive cost of I/O with the cost of perform string matching on files in the load path. It makes assumptions about which files belong to a locale and selectively loads only these files.

How does the `LazyLoad` Backend know which files belong to which locale?

It makes assumptions about how files are named. Clients must abide by this naming system if they decide to use this backend.

The heuristic used to bind a file to its locale can be defined as follows:

the filename is in the I18n load path
the filename ends in a supported extension (ie. .yml, .json, .po, .rb)
the filename starts with the locale identifier (ex: "translations/en_001.yml")

When should someone use this backend?

Workloads that operate in the context of a single locale at a time and have many translations files for many locales. For instance, a large Rails workload would benefit from this backend.

It's designed for test environments, not environments where eager loading is preferred.

Benchmarks: Comparing the `Simple` backend to the `LazyLoad` backend

A benchmark setup was used to compare the performance of these two backends.

Table 1: Setup with 10 files per locale, 100 keys in each file:

Backend	Work Performed	User	Sys	Total	Real
Simple	Eager load (:en)	0.012764	0.000721	0.013485	0.013503
Simple	3 Eager loads (:en, :fr, :de)	0.012364	0.000675	0.013039	0.013038
LazyLoad	Eager load (:en)	0.004820	0.000330	0.005150	0.005137
LazyLoad	3 Eager loads (:en, :fr, :de)	0.019816	0.000847	0.020663	0.020674

Table 2: Setup with 100 files per locale, 1000 keys in each file:

Backend	Work Performed	User	Sys	Total	Real
Simple	Eager load (:en)	1.342190	0.020641	1.362831	1.363569
Simple	3 Eager loads (:en, :fr, :de)	1.344860	0.018035	1.362895	1.363284
LazyLoad	Eager load (:en)	0.478600	0.011205	0.489805	0.489951
LazyLoad	3 Eager loads (:en, :fr, :de)	1.357584	0.026064	1.383648	1.384148

Exploring the results

The Simple backend works for the same amount of time in the case when it needs to load translations for a single locale, and when it loads translations for all locales. This makes sense as the backend loads all translations irrespective of the current locale.

The LazyLoad backend reduces working time as it avoids loading unnecessary files. In the case when loading for a single locale, we see that the LazyLoad backend outperforms Simple, 0.005 vs 0.013 in Table 1 and 0.4899 vs 1.363 in Table 2.

The LazyLoad backend performs roughly on-par with the Simple backend when it needs to load all translations. There is additional overhead of string matching which brings down the performance in small workloads. It's negligible in any significant workloads compared to the time spent in I/O.

Remarks

This backend is designed to bring performance improvements to workloads with a large volume of locales, translation files, and translation keys.

Performance isn't guaranteed for all applications, which is why the backend is designed to be opt-in.

At Shopify, we've patched ruby-i18n locally to implement a similar strategy. We've observed close to 10x speed ups locally in specific tests and roughly 20% speeds across the suite.

paarthmadan · 2022-01-19T18:56:57Z

test/benchmark/benchmark_lazyload_test.rb

@@ -0,0 +1,61 @@
+require 'test_helper'


This is a temporary test that's used to setup the benchmarks. It'll be presented upstream, but I don't intend on merging this.

paarthmadan · 2022-01-19T18:58:03Z

Please review @Shopify/rails

cc: @adrianna-chang-shopify, @shioyama

adrianna-chang-shopify

This is great work, Paarth! 👏 🚀

As discussed IRL: #available_locales doesn't work right now, because it will only look at loaded locales, which is likely to be incomplete. I think we'll need to do something similar to what @shioyama did in Core and use our selection heuristics to grab available locales from the load path.

A couple of points we talked about IRL that are not blockers for this PR, but that we might want to think about in terms of delivering this as a feature upstream. I've writing them down here in case anyone else has thoughts / feedback on them, and to remind myself what was discussed 😄

We might want to ensure that this can work out of the box with Rails. There are default translations in Rails that don't conform to the path-matching regex we've specified. Maybe this means additional support in https://github.com/svenfuchs/rails-i18n to ensure that these translations are always loaded by default, similar to what we did in Core.
Should we have some sort of sanity check that apps can use to verify their translations are set up correctly? Might be nicer than failing silently if a translation has an unusual file path and doesn't get picked up properly. Possibly a rake task?

I don't have sufficient context on the failing JRuby tests -- maybe someone else from the team can step in there.

~~One more question: are we intending that folks use this in production? Obviously the use case we have in Core is strictly for our test environment.~~ Prod is eager loaded so this is a no-go 🤦‍♀️

I think our next step is to try to adopt this in Core, and make sure all the tests continue to pass as expected.

test/backend/lazy_load_test.rb

lib/i18n/backend/lazy_load.rb

test/backend/lazy_load_test.rb

shioyama

Looks great! 👍 Just some initial comments, mainly about the format conventions for locale filenames.

lib/i18n/backend/lazy_load.rb

casperisfine · 2022-01-20T09:00:05Z

When should someone use this backend?

I believe you should make it clear in that section that this backend is for development and test environment, and shouldn't be used in production environments.

lib/i18n/backend/lazy_load.rb

test/backend/lazy_load_test.rb

paarthmadan · 2022-01-21T21:08:57Z

Thanks for the reviews, I incorporated all stylistic changes, nits, and minor feedback.

I believe you should make it clear in that section that this backend is for development and test environment, and shouldn't be used in production environments.

I agree, I've updated documentation to make this more clear, and I've changed the behaviour of eager_load! to solidify that this shouldn't be used in envs where eager loading is the best practice.

paarthmadan · 2022-01-21T21:22:58Z

The crux of this problem, now, seems to be which naming format the backend assumes and how we can generate available locales from this.

I'd like to collect your feedback on some approaches for how to proceed.

Here's a small set of criteria I've used to evaluate the approaches:

The solution should be generalizable. In other words, how many apps actually abide by this format? Could a simple Rails app immediately make use of the new backend?
How difficult is it to use the solution in core?
How much overhead is introduced with this solution?

With this criteria in mind, here are some approaches Adrianna and I discussed

Approach 1: Assume all files abide by `<locale>-translation.yml` format (ie. locale at the start of the file)

Advantages:

This would make the #available_locales implementation straightforward
The constraint on the naming format is small, so backwards compatibility isn't severed (if we start too large, it'll be hard to move back)

Disadvantages:

Solution isn't necessarily generalizable. (ie. many apps use paths like /path/en/views/file.yml)
Usage in core isn't trivial, because not all locale files in core abide by this rule.

Approach 2: Assume all files abide by `<locale>-translation.yml` format OR `/path/<locale>/translation.yml` (ie. locale at the start of the file OR identifier in the path)

Advantages:

The constraint on the naming format is still relatively small, so backwards compatibility isn't severed

Disadvantages:

Solution is more generalizable but as highlighted by this comment supporting paths like these make it hard to extract the locales)
#available_locales implementation is virtually impossible without loading all translations or making hasty generalizations

Approach 3: Assume all files are formatted in a specified way and introduce configuration for "path to locale" resolution

This is the same solution as the two above, but now we create an interface that allows clients to specify how paths are related to their locale.

Imagine a proc in a configuration file such as, or something similar:

# Example: /config/initializers/i18n.rb
I18n.lazy_load.path_resolve = ->(path) do
  extract_locale(path) # Example Implementation 1
  LookupTable[path]    # Example implementation 2
  # ... etc
end

Advantages:

This helps generalization, we give the client the interface to easily work around edge cases such as files that don't abide by the naming convention.
Enables usage in core

Disadvantages:

Onboarding is very involved. Have to write custom logic per app to configure load path selection heuristic.
#available_locales will need to be left to client, which will still be tricky.

Approach 4: Generate / Dump mapping between locale file and locales (use this instead of string matching)

This approach varies the most from what's already presented.

At a high-level:

Generate static mapping by loading all translations and relating a locale to the files that it came from.
This dump can be enforced in the CI build, the mapping will need to be checked into version control
Locally, the LazyLoad backend will load this mapping into memory and consult the lookup table to deduce which files need to be loaded. Instead of searching through load path, we consult the table. #available_locales are simply the keys of the table.
Locally, when new translation files are introduced or a translation files change, a re-dump will be required. I believe this can handled more intelligently (ie. If a file is modified, load it regardless of the locale)

Note: The Simple backend loads all translations to determine available locales anyways. We would have to pay this cost once, preferably as part of the build process. If we can define a way to resolve this mapping when local changes occur, then this solution seems promising (ie. using a digest for modified files, or always including translations that are dirty in the context of the git branch)

Advantages:

The most generalizable. It doesn't define a specific format, or require a specific hierarchy
Doesn't require client side configuration
Easily works with Core

Disadvantages:

Produces an artifact that must be persisted
Mapping will need to be updated locally (or local changes will need to be handled differently). If this can't be done without reloading all translations locally, then it defeats the purpose.

The list certainly isn't exclusive, nor are any of the solutions well-defined. I wanted to collect feedback early to see if anyone had any strong opinions or had some ideas they'd like to share.

cc: @shioyama, @casperisfine, @adrianna-chang-shopify

Ultimately, this problem is tricky because we don't know which files belong to a locale without loading them. The complexity rises in trying to do this without actually doing it 😄

lib/i18n/backend/lazy_load.rb

casperisfine · 2022-01-24T11:45:04Z

which naming format the backend assumes and how we can generate available locales from this.

While it's the most accurate, I don't think the mapping solution has good ergonomics.

I think we should either enforce a naming convention and / or have a list of regexp or other type of callbacks to extract a locale from a path. This means that we should have some kind of strict mode if the assumptions end up wrong, and they should be tested via loading all ~~tests~~ translations.

adrianna-chang-shopify · 2022-01-24T16:37:06Z

I'd lean towards approach 3. As Jean said, option 4 guarantees accuracy but is a bit tedious and involved for users to set up. I think that especially because this is a test environment optimization, it should be as simple as possible for users to set up / opt into. Going with 3 allows us to start very simple and assume that 95% of apps can comply with those expectations out of the box (I think we should start with the assumption that "all files abide by <locale>-translation.yml format"). We can then figure out what an ideal API looks like as we extend this to Core and figure out how to make it work there.

Disadvantages:

Onboarding is very involved. Have to write custom logic per app to configure load path selection heuristic.

#available_locales will need to be left to client, which will still be tricky.

I'd actually argue that this is not so involved. For most applications, going with the default format should be enough. We can be pretty explicit about we we expect from the "path resolve" feature -- as Jean suggested, it could be as simple as providing a regex / list of regexes that extracts the locale from the filepath. Users wouldn't necessarily need to write #available_locales themselves -- we should just be able to go through the files in the load path and extract the locales that meet the patterns. The biggest thing to watch out for here is matching locales that aren't really locales, but maybe we could offer a script / CLI command that "sanity checks" the app's translation setup and ensures that the list of locales provided by #available_locales is equivalent to the set of locale keys generated by reading all of the translation files.

TL;DR - I think we can keep things simple for now and be rigid about the naming configuration we expect, and figure out a way to offer configuration as a follow-up.

paarthmadan · 2022-01-28T14:36:26Z

After syncing with Jean and Adrianna, an approach to get a first iteration and possibly integrate with Core in the future:

Use simplest locale format, with either a) demarcation or b) tracking loaded_paths
Implement available_locales
Implement varied behaviour with configured mode (ie. eager vs. lazy)
Propose PR upstream to ruby-i18n

If the PR is well-received, the plan for Core / Rails can be as simple as:

Moving all locale files to abide by new format by renaming / splitting files. This can be done with a script that introspects the load path, checks if the file abides by the convention, and if not, renames and splits the file appropriately.

The amount of work remaining to propose this upstream as defined above is limited. The calendar time for reviews may be longer, but if the PR does make it upstream, the Core work can also be done at any time in the future. It should take a few days at most, mostly scripting and testing.

lib/i18n/backend/lazy_loadable.rb

casperisfine

Some minor stuff, but looks like it's on the right track, and it really isn't that much code.

lib/i18n/exceptions.rb

lib/i18n/backend/lazy_loadable.rb

adrianna-chang-shopify

This looks great, Paarth! I really appreciate the amount of detail you've put into test cases and documentation ❤️ I have a bunch of minor comments (feel free to contest any suggestions I've made that don't make sense 😆 ), and I'd like to give it a tophat on a simple Rails app, but this looks pretty much ready to be proposed upstream IMO 🚀

lib/i18n/backend/base.rb

lib/i18n/backend/lazy_loadable.rb

lib/i18n/tests/basics.rb

lib/i18n/exceptions.rb

casperisfine · 2022-01-31T20:59:52Z

lib/i18n/backend/lazy_loadable.rb

+      def file_named_correctly?(path, translations)
+        locales = translations.keys.map(&:to_sym)
+        return false unless locales.one?
+
+        LocaleExtractor.locale_from_path(path) == locales.first
+      end


Rather than return true/false this method should raise a final error. This way we can include in the error message which assumption was broken, and in which file, making it much easier to fix.

I was on the fence about which way to go. I agree that knowing which assumption was broken will help, and this is something I'll add with the current approach.

The reason why I decided to raise with all the offending files at once is mainly a DX concern. If someone's onboarding using this strategy and would like to acquire a complete list of offending files, it would be handy to raise this in a single error.

My worry is that people will fight with repeated errors being raised and only able to fix them on a one-by-one basis.

Thoughts?

You are right, collecting all the offenses and raising once is even better. But I think you can do both, it just can't be expressed by simple booleans since you have multiple types of offense (e.g. path not matching at all, and content not respecting what the path claims).

So maybe you can collect a list of error message in an array and then raise if it's not empty.

We reached the same conclusion when pairing on this yesterday! 😄 I believe Paarth has submitted the changes upstream now: https://github.com/ruby-i18n/i18n/pull/612/files#diff-8e52202d93e7810223b33ab64de65635e73f04ffa8d7d38b14b5e656039d3ea1R152-R162

paarthmadan · 2022-02-03T01:32:15Z

Closing in favour of ruby-i18n#612

paarthmadan force-pushed the pm/lazy-load-backend branch from 5454e9b to b6e1c13 Compare January 19, 2022 18:52

paarthmadan commented Jan 19, 2022

View reviewed changes

adrianna-chang-shopify reviewed Jan 19, 2022

View reviewed changes

shioyama reviewed Jan 20, 2022

View reviewed changes

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved

shioyama reviewed Jan 20, 2022

View reviewed changes

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved

casperisfine reviewed Jan 20, 2022

View reviewed changes

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved

test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved

test/backend/lazy_load_test.rb Outdated Show resolved Hide resolved

paarthmadan force-pushed the pm/lazy-load-backend branch 4 times, most recently from 8a801a9 to 74091d5 Compare January 21, 2022 20:09

paarthmadan requested review from shioyama, adrianna-chang-shopify and casperisfine January 21, 2022 21:26

casperisfine reviewed Jan 24, 2022

View reviewed changes

lib/i18n/backend/lazy_load.rb Outdated Show resolved Hide resolved

paarthmadan added 3 commits January 28, 2022 15:35

Sort imports

3afb608

Introduce LazyLoad backend

39229a4

Benchmark: Compare LazyLoad vs Simple backend

e1865b7

paarthmadan force-pushed the pm/lazy-load-backend branch from 1a8c5ce to 0709ec7 Compare January 28, 2022 20:36

paarthmadan requested a review from casperisfine January 28, 2022 20:40

casperisfine reviewed Jan 31, 2022

View reviewed changes

lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved

casperisfine reviewed Jan 31, 2022

View reviewed changes

lib/i18n/exceptions.rb Outdated Show resolved Hide resolved

lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved

lib/i18n/backend/lazy_loadable.rb Outdated Show resolved Hide resolved

paarthmadan added 2 commits January 31, 2022 14:08

Support multiple modes, implement #available_locales, use simple format

5087ffe

Rename LazyLoad to LazyLoadable backend

a7cfb12

paarthmadan force-pushed the pm/lazy-load-backend branch 2 times, most recently from 5841e5b to 996965f Compare January 31, 2022 19:24

adrianna-chang-shopify reviewed Jan 31, 2022

View reviewed changes

casperisfine reviewed Jan 31, 2022

View reviewed changes

Accept block in #load_translations, yield translations

6847747

paarthmadan force-pushed the pm/lazy-load-backend branch 4 times, most recently from 50765ec to 05605be Compare February 2, 2022 23:38

Raise invalid files in test, ensure file only loads for one locale

8a633b8

paarthmadan force-pushed the pm/lazy-load-backend branch from 05605be to 8a633b8 Compare February 2, 2022 23:40

Add #lookup

4d24983

paarthmadan closed this Feb 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce LazyLoad Backend #3

Introduce LazyLoad Backend #3

paarthmadan commented Jan 19, 2022 •

edited

paarthmadan Jan 19, 2022

paarthmadan commented Jan 19, 2022

adrianna-chang-shopify left a comment •

edited

shioyama left a comment

casperisfine commented Jan 20, 2022

paarthmadan commented Jan 21, 2022

paarthmadan commented Jan 21, 2022

casperisfine commented Jan 24, 2022 •

edited

adrianna-chang-shopify commented Jan 24, 2022 •

edited

paarthmadan commented Jan 28, 2022

casperisfine left a comment

adrianna-chang-shopify left a comment

casperisfine Jan 31, 2022

paarthmadan Feb 2, 2022

casperisfine Feb 3, 2022

adrianna-chang-shopify Feb 3, 2022

paarthmadan commented Feb 3, 2022

Introduce LazyLoad Backend #3

Introduce LazyLoad Backend #3

Conversation

paarthmadan commented Jan 19, 2022 • edited

What's in this PR

What does the LazyLoad Backend offer?

How does the LazyLoad Backend work?

How does the LazyLoad Backend know which files belong to which locale?

When should someone use this backend?

Benchmarks: Comparing the Simple backend to the LazyLoad backend

Exploring the results

Remarks

paarthmadan Jan 19, 2022

Choose a reason for hiding this comment

paarthmadan commented Jan 19, 2022

adrianna-chang-shopify left a comment • edited

Choose a reason for hiding this comment

shioyama left a comment

Choose a reason for hiding this comment

casperisfine commented Jan 20, 2022

paarthmadan commented Jan 21, 2022

paarthmadan commented Jan 21, 2022

Approach 1: Assume all files abide by <locale>-translation.yml format (ie. locale at the start of the file)

Approach 2: Assume all files abide by <locale>-translation.yml format OR /path/<locale>/translation.yml (ie. locale at the start of the file OR identifier in the path)

Approach 3: Assume all files are formatted in a specified way and introduce configuration for "path to locale" resolution

Approach 4: Generate / Dump mapping between locale file and locales (use this instead of string matching)

casperisfine commented Jan 24, 2022 • edited

adrianna-chang-shopify commented Jan 24, 2022 • edited

paarthmadan commented Jan 28, 2022

casperisfine left a comment

Choose a reason for hiding this comment

adrianna-chang-shopify left a comment

Choose a reason for hiding this comment

casperisfine Jan 31, 2022

Choose a reason for hiding this comment

paarthmadan Feb 2, 2022

Choose a reason for hiding this comment

casperisfine Feb 3, 2022

Choose a reason for hiding this comment

adrianna-chang-shopify Feb 3, 2022

Choose a reason for hiding this comment

paarthmadan commented Feb 3, 2022

paarthmadan commented Jan 19, 2022 •

edited

What does the `LazyLoad` Backend offer?

How does the `LazyLoad` Backend work?

How does the `LazyLoad` Backend know which files belong to which locale?

Benchmarks: Comparing the `Simple` backend to the `LazyLoad` backend

adrianna-chang-shopify left a comment •

edited

Approach 1: Assume all files abide by `<locale>-translation.yml` format (ie. locale at the start of the file)

Approach 2: Assume all files abide by `<locale>-translation.yml` format OR `/path/<locale>/translation.yml` (ie. locale at the start of the file OR identifier in the path)

casperisfine commented Jan 24, 2022 •

edited

adrianna-chang-shopify commented Jan 24, 2022 •

edited