Add benchmarking to faker-ruby #2851

stefannibrasil · 2023-11-13T18:50:18Z

[this issue will be assigned to participants of the RubyConf2023 Hack Day]

There are a couple of issues regarding performance:

As a first step to tackling these performance issues, we would like to start measuring the performance of faker-ruby as part of the test suite by adding a benchmarking tool.

This tool or script would run as part of the test suite and measure:

load time for the library
generator execution speed
changing locales and measuring load time and performance

Resources:

Other gems do something similar, here are some examples:

stefannibrasil · 2023-11-13T19:03:30Z

assign to @salochara

jhawthorn · 2023-11-13T19:53:31Z

$ time ruby -e 'require "faker"; Faker::Coin.flip'
ruby -e 'require "faker"; Faker::Coin.flip'  0.47s user 0.05s system 99% cpu 0.519 total

$ vernier run -- ruby -e 'require "faker"; Faker::Coin.flip'
starting profiler with interval 500
#<Vernier::Result 0.514918 seconds, 1 threads, 927 samples, 185 unique>
written to /var/folders/0_/nnfg8lzj3t31jks4jdkccl4r0000gn/T/profile20231113-87519-a67roy.vernier.json

https://share.firefox.dev/49wviXk

Add benchmark rake task for evaluating time execution. Fixes Co-authored-by: Rubens Fernandes <rubens.fernandes97@gmail.com>

stefannibrasil · 2023-11-21T03:31:55Z

Next steps is to try to use I18n LazyLoadable: https://github.com/ruby-i18n/i18n/blob/master/lib/i18n/backend/lazy_loadable.rb to see if it makes a difference in terms of loading time and performance. Not sure if we will need to update the locale files paths.

* Add benchmark gem to Faker * Fix #2851 Add benchmark rake task for evaluating time execution. Fixes Co-authored-by: Rubens Fernandes <rubens.fernandes97@gmail.com> * Commit PR suggestions by Thiago. - Build all_methods outside the benchmark since we don't want to measure that. * Commit PR suggestions by Steffani. - Avoid calling the same object twice --------- Co-authored-by: Rubens Fernandes <rubens.fernandes97@gmail.com>

salochara · 2024-01-12T20:25:55Z

Hello @stefannibrasil 👋
I hope you're doing great.

I spent some time today looking at LazyLoadable. It seems like a pretty powerful alternative for improving time performance in testing environments.

I read through the implementation and discussion in this PR ruby-i18n/i18n#612.
It turns out, LazyLoadable was implemented by Shopify devs and the industry proof they have, is that their test suite was 20% faster (as shown in the PR above).

What I'm not sure, is why they suggest this LazyLoadable backend is only meant to be for testing environments. I'm not sure what's the reasoning behind.

In any case, regarding Faker and using the LazyLoadable backend, I did some testing.
In faker.rb, I simply added the following line:
I18n.backend = I18n::Backend::LazyLoadable.new(lazy_load: true)
That's all there is really to enable LazyLoadable.

The thing is that, whenever that's enabled, when running rake a bunch of errors come up.

LazyLoadable is expecting a certain naming system for the locales, and in some cases, we don't have that in Faker.
These are: (as specified here: ruby-i18n/i18n#612)

How does the LazyLoadable Backend know which files belong to which locale?
It makes assumptions about how files are named. Clients must abide by this naming system if they decide to use this backend.

The heuristic used to bind a file to its locale can be defined as follows:

the filename is in the I18n load path
the filename ends in a supported extension (ie. .yml, .json, .po, .rb)
the filename starts with the locale identifier
the locale identifier and optional proceeding text is separated by an underscore, ie. "_".

So to move forward, I have mainly 2 issues/questions

I'm not sure why LazyLoadable is only proposed for testing enviroments. If the testing enviorment is the only one we can enable it in, do you think is worth adding it to Faker?
There are tons of errors that show up when enabling the LazyLoadable backend. If we do want to implement it, we would have to start by fixing those errors, and then be able to measure the benefits from having it enabled.

What are your thoughts on this?

All the best!
Salomón.

salochara · 2024-01-15T20:06:23Z

Hello! @stefannibrasil

Just a quick update.
I started a discussion on this PR on the reasoning behind this feature being only for testing environment. The authors of that PR very kindly explained the reasoning (see here).

Bottom line... the lazy loading feature only makes sense in the testing environment.
With the context I provided on my previous comment.. do you think is worth adding it to our testing suite?

All the best!
Salo.

thdaraujo · 2024-01-15T21:33:27Z

hey @salochara , thanks for the investigation!

And yeah, what they said in here makes a lot of sense.

You want translations to be available in memory so you can use them as fast as possible in production. But in your test environment, you're not going to use all the translations in your tests anyway. So adding lazy loading gives you faster boot times, which lets your tests run quickly.

I think the way Faker is used puts us in a tricky situation:

On one hand, when you're using Faker in your test environment, you want generators to be fast so your test runs quickly, and for that you do want all translations already loaded in memory.

On the other hand, you also want your tests to boot fast, so running all your specs doesn't take too long. And in that case, you don't want to load all your translations, because it takes time to read and parse them.

Also, it can be hard to know which translation file Faker will need at boot time, because this depends on the generator you're using, so lazy loading would certainly help here.

In summary, we want it all. 😆

When running faker in production/staging for generating example data, this is not a problem, because you won't be rebooting the app very often, so lazy loading is not needed here.

--

But I think there are a couple of ideas we could think about:

what if we only load the default locale on boot, and only load other locales when they're needed / when a new locale is set?

We might be doing this right now, but I'm not sure, would have to see how i18n behaves. But if not, maybe this could be a potential improvement. Assuming that we only load :en locales, for example, would that make any difference?

if adding lazy loading to i18n is really a requirement, could we build a custom lazy loader for i18n that understands the way Faker locales are organized?

I think we could come up with a loader that could map a given generator to its required locale and load it when necessary.

can we just load locales faster?

IMO I think this is the more interesting question. Can we just make locales load very fast?

Assuming that the user won't be adding their own custom translations/locales to Faker, we can say that all possible locales are fixed when we generate a new release of Faker.

If that's true, then we could just transform all locales into something that can be loaded quickly.

Then this becomes a problem of data serialization.

One option that someone suggested was transforming our yaml locales into json, because json parsing is probably way faster.

Another option would be to turn the locales into a binary format that is easy to load into memory via marshalling. This wold then be used as a backend for all generators.

Our locales could still be yaml files, but we transform them into something that can be loaded faster and package that with the gem.

I think option 3 is worth exploring more, but I'd be interested in your opinion @salochara @stefannibrasil

stefannibrasil added 💡 Issue: Feature Request rubyconf2023 labels Nov 13, 2023

stefannibrasil changed the title ~~Add benchmarking for faker-ruby~~ Add benchmarking to faker-ruby Nov 13, 2023

salochara added a commit to salochara/faker that referenced this issue Nov 13, 2023

Fix faker-ruby#2851

397e9e5

Add benchmark rake task for evaluating time execution. Fixes Co-authored-by: Rubens Fernandes <rubens.fernandes97@gmail.com>

salochara added a commit to salochara/faker that referenced this issue Nov 13, 2023

Fix faker-ruby#2851

b72260f

Add benchmark rake task for evaluating time execution. Fixes Co-authored-by: Rubens Fernandes <rubens.fernandes97@gmail.com>

salochara mentioned this issue Nov 13, 2023

Add benchmark #2855

Merged

2 tasks

stefannibrasil closed this as completed in #2855 Nov 21, 2023

salochara mentioned this issue Jan 26, 2024

Benchmarking loading JSON file vs YML file #2897

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarking to faker-ruby #2851

Add benchmarking to faker-ruby #2851

stefannibrasil commented Nov 13, 2023 •

edited

stefannibrasil commented Nov 13, 2023

jhawthorn commented Nov 13, 2023

stefannibrasil commented Nov 21, 2023

salochara commented Jan 12, 2024 •

edited

salochara commented Jan 15, 2024

thdaraujo commented Jan 15, 2024 •

edited

Add benchmarking to faker-ruby #2851

Add benchmarking to faker-ruby #2851

Comments

stefannibrasil commented Nov 13, 2023 • edited

stefannibrasil commented Nov 13, 2023

jhawthorn commented Nov 13, 2023

stefannibrasil commented Nov 21, 2023

salochara commented Jan 12, 2024 • edited

salochara commented Jan 15, 2024

thdaraujo commented Jan 15, 2024 • edited

stefannibrasil commented Nov 13, 2023 •

edited

salochara commented Jan 12, 2024 •

edited

thdaraujo commented Jan 15, 2024 •

edited