-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memoization strategy #539
Memoization strategy #539
Conversation
Memoization is useful if you want to run a command once but be able to access that value many times. However, the popular `||=` operator is not great for when your command might return a falsey value. By guarding memoization instance methods with `if defined?`, we make sure it only runs once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great addition to the guide.
Co-authored-by: Nuno Silva <nunosilva800@gmail.com>
Co-authored-by: Nuno Silva <nunosilva800@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a little nuance to the recommendation? Most of the time the terser rose memoization is appropriate.
|
||
## Memoization | ||
|
||
* Prefer `return @x if defined?(@x)` over a simple `||=` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Prefer `return @x if defined?(@x)` over a simple `||=` | |
* Prefer `return @x if defined?(@x)` over `||=` if the statement being memoized can evaluate to `nil` or `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I didn't specify this nuance on purpose. I think having a single way of doing things simplifies the process of coding, and can protect the code from certain changes.
To illustrate my point, let's say we have the following method:
def merchants
@merchant ||= Merchant.where(some_condition)
end
Now, even when no merchants are found, the result would be []
, which according to your suggestion would not grant a guard statement because it would never be false-y. If for some reason, though, the method changes to just check if merchants exist, it would require the person changing it to remember this distinction in behaviour and then apply the guard.
Without having the guard in the original method, I find it very likely the method would evolve to something like this:
def merchants?
@merchants_exist ||= Merchant.exists?(some_condition)
end
Which could in fact result in N queries.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, everything around memoization requires nuance, because memoization is caching and cache invalidation is one of the Two Hard Problems1. I think your addition is a good detail to include in our style guide to reduce the chance someone mindlessly repeats the rose memoization pattern. But it would also be suboptimal for someone to mindlessly repeat return @x if defined?(@x)
.
I don't support forbidding rose memoization, so I'll advocate for the nuance in our discussion of it.
On a similar note, a section on Memoization might also include a note about memoizing a method that takes arguments.
# Bad. A stale value can be returned when a different argument is passed for subsequent calls
def expensive_result(input)
@expensive_result ||= ExpensiveResult.calculate(input)
end
# Better. Recalculates when a different argument is passed.
# Consider using an LRU or not memoizing at all if the argument cardinality
# is high or the object is long-lived.
def expensive_result(input)
@expensive_results ||= {}
@expensive_results[input] ||= ExpensiveResult.calculate(input)
end
# If the expensive result might be falsey
def expensive_result(input)
@expensive_results ||= {}
return @expensive_results[input] if @expensive_results.key?(input)
@expensive_results[input] = ExpensiveResult.calculate(input)
end
Footnotes
-
along with naming things and off-by-one errors, famously ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't support forbidding rose memoization, so I'll advocate for the nuance in our discussion of it.
I completely agree we shouldn't forbid anything. The way I think about these style guides are not as rules, but as suggestions that apply to at least 90% of cases. As I see it, in the vast majority of cases, there's no harm to "mindlessly repeat" the guard statement - but I wouldn't advocate for using it in more complex caching situations.
Given my understanding of what these guides are for, I do think adding too much nuance to the instruction makes it sort of void, though.
What is your view on how these guides should be used?
Co-authored-by: Tim Perkins <tjwp@users.noreply.github.com>
There was a previous discussion about this in #195. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is overhead of using defined?
. Since it is a keyword, it is hard to optimize by the VM. In reality, memoization itself is hard to optimize in the VM right now.
If I'd write anything in the styleguide about memoization would be in the lines of telling people to avoid memoization. Judging by the code I see in our applications, most of the time memoization is done at the wrong time and in the wrong place.
But regardless, defined?
over ||=
isn't a matter of style, they have very different behavior and both can and should be used, so I don't think a style guide is the right place to include an explanation of when use one over the other. This is more content for a Ruby book, or a Ruby tutorial document. People need to learn when to use one over the other, or when not use any of them. This isn't the job of this styleguide.
If both are hard to optimise, is there really a drawback in suggesting one over the other?
This is an interesting angle. I'm not opposed to it at all, but I think there's still value in nudging people to a way with fewer side effects.
In my view, it does fit the styleguide. The way I think about it is that there's more harm to come from leaving this completely open than to make a loose suggestion for In your opinion, what is the goal of the styleguide? |
I just realised this rule is already kind of in the styleguide:
With that I'm happy to close this, unless we want to rephrase the existing rule |
I'm ok to expand that text to explain when |
Memoization is useful if you want to run a command once but be able to access that value many times.
However, the popular
||=
operator is not great for when your command might return a falsey value. By guarding memoization instance methods withif defined?
, we make sure it only runs once.This is particularly important to stop expensive queries that may return
nil
orfalse
from running needlessly.