Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoization strategy #539

Closed
wants to merge 4 commits into from
Closed

Memoization strategy #539

wants to merge 4 commits into from

Conversation

jmeinerz
Copy link

@jmeinerz jmeinerz commented May 10, 2023

Memoization is useful if you want to run a command once but be able to access that value many times.

However, the popular ||= operator is not great for when your command might return a falsey value. By guarding memoization instance methods with if defined?, we make sure it only runs once.

This is particularly important to stop expensive queries that may return nil or false from running needlessly.

Memoization is useful if you want to run a command once but be able to access that value many times.

However, the popular `||=` operator is not great for when your command might return a falsey value. By guarding memoization instance methods with `if defined?`, we make sure it only runs once.
@jmeinerz jmeinerz requested a review from a team as a code owner May 10, 2023 09:09
@jmeinerz jmeinerz self-assigned this May 10, 2023
Copy link
Contributor

@nunosilva800 nunosilva800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great addition to the guide.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
jmeinerz and others added 2 commits May 10, 2023 10:22
Co-authored-by: Nuno Silva <nunosilva800@gmail.com>
Co-authored-by: Nuno Silva <nunosilva800@gmail.com>
README.md Outdated Show resolved Hide resolved
Copy link
Member

@cbothner cbothner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a little nuance to the recommendation? Most of the time the terser rose memoization is appropriate.


## Memoization

* Prefer `return @x if defined?(@x)` over a simple `||=`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Prefer `return @x if defined?(@x)` over a simple `||=`
* Prefer `return @x if defined?(@x)` over `||=` if the statement being memoized can evaluate to `nil` or `false`.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I didn't specify this nuance on purpose. I think having a single way of doing things simplifies the process of coding, and can protect the code from certain changes.

To illustrate my point, let's say we have the following method:

def merchants
  @merchant ||= Merchant.where(some_condition)
end

Now, even when no merchants are found, the result would be [], which according to your suggestion would not grant a guard statement because it would never be false-y. If for some reason, though, the method changes to just check if merchants exist, it would require the person changing it to remember this distinction in behaviour and then apply the guard.

Without having the guard in the original method, I find it very likely the method would evolve to something like this:

def merchants?
  @merchants_exist ||= Merchant.exists?(some_condition)
end

Which could in fact result in N queries.

Thoughts?

Copy link
Member

@cbothner cbothner May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, everything around memoization requires nuance, because memoization is caching and cache invalidation is one of the Two Hard Problems1. I think your addition is a good detail to include in our style guide to reduce the chance someone mindlessly repeats the rose memoization pattern. But it would also be suboptimal for someone to mindlessly repeat return @x if defined?(@x).

I don't support forbidding rose memoization, so I'll advocate for the nuance in our discussion of it.


On a similar note, a section on Memoization might also include a note about memoizing a method that takes arguments.

# Bad. A stale value can be returned when a different argument is passed for subsequent calls
def expensive_result(input)
  @expensive_result ||= ExpensiveResult.calculate(input)
end

# Better. Recalculates when a different argument is passed.
# Consider using an LRU or not memoizing at all if the argument cardinality
# is high or the object is long-lived.
def expensive_result(input)
  @expensive_results ||= {}
  @expensive_results[input] ||= ExpensiveResult.calculate(input)
end

# If the expensive result might be falsey
def expensive_result(input)
  @expensive_results ||= {}
  return @expensive_results[input] if @expensive_results.key?(input)

  @expensive_results[input] = ExpensiveResult.calculate(input)
end

Footnotes

  1. along with naming things and off-by-one errors, famously

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't support forbidding rose memoization, so I'll advocate for the nuance in our discussion of it.

I completely agree we shouldn't forbid anything. The way I think about these style guides are not as rules, but as suggestions that apply to at least 90% of cases. As I see it, in the vast majority of cases, there's no harm to "mindlessly repeat" the guard statement - but I wouldn't advocate for using it in more complex caching situations.

Given my understanding of what these guides are for, I do think adding too much nuance to the instruction makes it sort of void, though.

What is your view on how these guides should be used?

Co-authored-by: Tim Perkins <tjwp@users.noreply.github.com>
@sambostock
Copy link
Contributor

There was a previous discussion about this in #195.

Copy link
Member

@rafaelfranca rafaelfranca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is overhead of using defined?. Since it is a keyword, it is hard to optimize by the VM. In reality, memoization itself is hard to optimize in the VM right now.

If I'd write anything in the styleguide about memoization would be in the lines of telling people to avoid memoization. Judging by the code I see in our applications, most of the time memoization is done at the wrong time and in the wrong place.

But regardless, defined? over ||= isn't a matter of style, they have very different behavior and both can and should be used, so I don't think a style guide is the right place to include an explanation of when use one over the other. This is more content for a Ruby book, or a Ruby tutorial document. People need to learn when to use one over the other, or when not use any of them. This isn't the job of this styleguide.

@jmeinerz
Copy link
Author

There is overhead of using defined?. Since it is a keyword, it is hard to optimize by the VM. In reality, memoization itself is hard to optimize in the VM right now.

If both are hard to optimise, is there really a drawback in suggesting one over the other?

If I'd write anything in the styleguide about memoization would be in the lines of telling people to avoid memoization. Judging by the code I see in our applications, most of the time memoization is done at the wrong time and in the wrong place.

This is an interesting angle. I'm not opposed to it at all, but I think there's still value in nudging people to a way with fewer side effects.

People need to learn when to use one over the other, or when not use any of them. This isn't the job of this styleguide.

In my view, it does fit the styleguide. The way I think about it is that there's more harm to come from leaving this completely open than to make a loose suggestion for defined?. And I do agree with you that "people need to learn" but I also think we should try and support those who are earlier in their journey to write good code without having to know everything. For a junior developer, a day at Shopify is already a lot to take, and I would like to help them write code with fewer side effects.

In your opinion, what is the goal of the styleguide?

@jmeinerz
Copy link
Author

I just realised this rule is already kind of in the styleguide:

Use ||= to initialize variables only if they're not already initialized.

With that I'm happy to close this, unless we want to rephrase the existing rule

@rafaelfranca
Copy link
Member

I'm ok to expand that text to explain when ||= isn't good and provide examples

@jmeinerz jmeinerz closed this Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants