Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved integer square root. #4403

Merged
merged 26 commits into from Feb 16, 2024
Merged

Improved integer square root. #4403

merged 26 commits into from Feb 16, 2024

Conversation

chgorman
Copy link
Contributor

@chgorman chgorman commented Jun 29, 2023

A more efficient implementation of integer square root within Solidity; improves upon #3242. Solves Issue #4402.

Proof of validity discussed here.

@changeset-bot
Copy link

changeset-bot bot commented Jun 29, 2023

⚠️ No Changeset found

Latest commit: e748bfc

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@frangio
Copy link
Contributor

frangio commented Jun 30, 2023

Thanks for this proposal.

Unfortunately our gas benchmarks are only able to measure non-view functions, so we can't see any improvement in the gas report for this PR.

Can you share how much of an improvement you are observing? Please share a few inputs that would allow us to confirm that too.

@chgorman
Copy link
Contributor Author

chgorman commented Jun 30, 2023

The previously linked article discusses integer square root computation within Solidity (in addition to giving proofs). Section 5 discusses gas costs; OZ had a mean gas cost of 948, while the new version had a mean gas cost of 757.

I am working on having a public repo where the gas analysis was performed but it is not ready yet; extended analysis here.

@chgorman chgorman marked this pull request as draft July 11, 2023 14:07
contracts/utils/math/Math.sol Outdated Show resolved Hide resolved
@Amxx
Copy link
Collaborator

Amxx commented Jul 24, 2023

The main change of this proposition is that it inlines initial estimation, instead of calling log2. That might indeed have some effect on the gas costs, but I would also argue it reduces the readability of the function.

@chgorman
Copy link
Contributor Author

chgorman commented Jul 24, 2023

Yes, the proposed change inlines a better initialization which requires only 6 Newton iterations rather than 7; the algorithm is provably correct. The current version has no proof of correctness.

The gas is reduced from 950 to 749 (mean gas cost from 2048 test values). These gas estimates and more will be in a public repo shortly for verification.

Shall I add comments to improve readability? Some comments have been added.

@Amxx
Copy link
Collaborator

Amxx commented Jul 24, 2023

The initial estimation looks very similar to what we had in the original sqrt PR

The consequence is that, appart from the readability, your ifs behave similarly to the log2 option. The "only" math difference that you PR makes is the result += result >> 1 part.

Where does this line come from ? When you say:

the proposed change inlines a better initialization which requires only 6 Newton iterations rather than 7; the algorithm is provably correct. The current version has no proof of correctness.

is that was you are talking about ?

If there is a proof of correctness, we should add it to the documentation. Saying that "it exist" is not enough IMO.

Also, if adding result += result >> 1 is the key part, then we should measure doing that change (and removing an interation) on the current code.


Edit: I think result += (result >> 1); is supposed to be an iteration. I'm not sure if that is correct of not. For the sake of readability I'd keep the same syntax as the other ones.


Edit2: I also measured ~200gas savings. Its distribued betwen:

  • the first iteration rewrite,
  • using result*result >= a instead of the Math.min → I'm not 100% but I was under the impression that the math.min includes an interation. I'd love to check the effect of it
  • doing a more "native" first estimation instead of relying on log2 (which does an extra interation we don't need here) and then dividing by 2.

Comment on lines 235 to 241
result += (result >> 1);
result = (result + a / result) >> 1;
result = (result + a / result) >> 1;
result = (result + a / result) >> 1;
result = (result + a / result) >> 1;
result = (result + a / result) >> 1;
result = (result + a / result) >> 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only count 7 iterations, yet the PDF you linked shows that in some cases, 8 are required. I'm worried about that part. Do you have an example of value that needs 8 iterations that we could run both version of the code with ?

Copy link
Contributor Author

@chgorman chgorman Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, Newton iteration refers to result = (result + a/result) >> 1. This is the Newton iteration step when computing square roots via Newton's method; see here.

So, result += (result >> 1); is not a Newton iteration but rather part of the initialization. Yes, you could replace it another Newton iteration, but that is more expensive.

As to your comments about "8 cases are required", you are misunderstanding the comment and figure. If only Newton iterations are performed and the same initialization is used that is currently used by OZ, then it sometimes takes 8 Newton iterations to reach the correct value. Note that this does not apply either in the current OZ code nor the proposed change. That figure was primarily meant to show why a good initialization value is required for efficient computation.

@chgorman
Copy link
Contributor Author

"The "only" math difference that you PR makes is the result += result >> 1 part." This is what allows the reduction of 7 to 6 Newton iterations. I could not find a similar initialization elsewhere. The other math difference is the return logic; with it, it is possible to prove that the algorithm is correct. For the proof, see Appendix B.

"Where does this line come from?" (referring to result += result >> 1): this is so that $result == 2^{e-1} + 2^{e-2}$ if $2^{e-1} \le sqrt(x) < 2^{e}$

"I'm not 100% but I was under the impression that the math.min includes an interation. I'd love to check the effect of it": please provide a proof that "math.min includes an iteration".

" I think result += (result >> 1); is supposed to be an iteration. I'm not sure if that is correct of not. For the sake of readability I'd keep the same syntax as the other ones.": As mentioned above, it is not a Newton iteration, but rather part of initialization.

@Amxx
Copy link
Collaborator

Amxx commented Jul 25, 2023

"I'm not 100% but I was under the impression that the math.min includes an interation. I'd love to check the effect of it": please provide a proof that "math.min includes an iteration".

The iteration is doign an arithmetic average between result and a / result. Also we know that at some point, the two values (result and a/result) will be (ceil(sqrt(a)) and floor(sqrt(a))). So at some point, when the precision is down to the last digit, result and a/result are two good candidates, and the smallest one is the expected value (the other one is +1).
That is the logic behind the min operation. Now can it replace the last iterration, I'm not so sure, and I have not tested it.

@Amxx
Copy link
Collaborator

Amxx commented Jul 25, 2023

@PaulRBerg You are the "math done in solidity" expert. WDYT of that initialization change ?

// Take care of easy edge cases
if (a <= 1) { return a; }
// This check ensures no overflow
if (a >= ((1 << 128) - 1)**2) { return (1 << 128) - 1; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk of overflow ? Can you document which part is succeptible ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this specific method and return logic, it is possible to overflow when computing result**2 during the check result**2 <= a, because it may be that result == 2**128; thus, result**2 == 0 because this is unchecked.

Copy link
Collaborator

@Amxx Amxx Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interresting. I guess this is not necessary if you do min(result, a/result) so the current version is good in that regard.

I'd rewrite that as:

Suggested change
if (a >= ((1 << 128) - 1)**2) { return (1 << 128) - 1; }
if (a >= uint256(type(uint128).max)**2) { return type(uint128).max; }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the cost of that test outweight the cost of the min at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interresting. I guess this is not necessary if you do min(result, a/result) so the current version is good in that regard.

The current version has no proof of correctness. This proposed change does. See Appendix B.4.3. Have you looked at that report?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code was updated with suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the overflow check (if (a >= uint256(type(uint128).max)**2) { return type(uint128).max; }) costs 23 gas.

@chgorman
Copy link
Contributor Author

chgorman commented Jul 25, 2023

"I'm not 100% but I was under the impression that the math.min includes an interation. I'd love to check the effect of it": please provide a proof that "math.min includes an iteration".

The iteration is doign an arithmetic average between result and a / result.

I really think it is better to think of the iteration not as the average of result and a / result, but rather what it is: a Newton iteration for computing square roots. This allows for using the error bounds for Newton iterations.

Also we know that at some point, the two values (result and a/result) will be (ceil(sqrt(a)) and floor(sqrt(a))).

Yes, it is true at some point that result == floor(sqrt(a)) (that is, the Integer Square Root) when Newton iterations continue to be performed. But that is not your stopping criterion. Your stopping criterion is: after doing 7 Newton iterations with a particular initialization, return min(result, a/result). What are the possible values for result after 7 Newton iterations, and does this ensure that min(result, a/result) always, in fact, returns floor(sqrt(a)) (Integer Square Root)?

This specific return logic (using min(result, a/result) is discussed in Appendix B.5. Either point out the logical flaw or provide a stronger analysis as to why those results don't apply. For the record, I have not found an instance where the original OZ sqrt function returns an incorrect value, but that is not a proof.

@chgorman
Copy link
Contributor Author

A repo with additional analysis may be found here.

@Amxx Amxx added this to the 5.1 milestone Aug 2, 2023
@Amxx
Copy link
Collaborator

Amxx commented Aug 2, 2023

We received a lot a gas optimization PRs that are great, but that we don't have time to process right now.
I'm tagging them for the 5.1 millestone so we don't lose track of it. Please understand that why it may look small, our processes, which are here for the security of everyone, makes processing them non trivial.

@chgorman
Copy link
Contributor Author

chgorman commented Aug 7, 2023

Yes, I understand needing to take the time for gas optimization and not having time now.

With that said, I would like to add that this is not just gas optimization but also replacing an algorithm (which has not been proven correct) with a provably-correct algorithm. If there are questions about the initialization choice or the math, please reach out.

@chgorman
Copy link
Contributor Author

I made a couple of small improvements to the algorithm. Additionally, please see the extended analysis here.

@chgorman chgorman marked this pull request as ready for review August 31, 2023 05:52
@chgorman chgorman requested a review from Amxx October 23, 2023 15:31
contracts/utils/math/Math.sol Outdated Show resolved Hide resolved
@chgorman
Copy link
Contributor Author

For the record, here has been my design philosophy:

  1. Provably correct Integer Square Root algorithm (followed closely by)
  2. Efficient as possible

Some Background: I first started investigating Integer Square Root algorithms because it was needed for an algebraic sigmoid bonding curve. I needed an integer square root function, so I looked at algorithms online. The current OZ alg was one I found online, and there were others which were very similar. I attempted to prove the algorithm's correctness, yet I ran into trouble with the return logic: the return logic (return min(result, a/result)) did not make sense and it is not clear that it ensures the correct result is returned (this was discussed more above). Does anyone know where that return logic (return min(result, a/result)) comes from?

Saying

It's been in the library for almost two years and it's been audited. That's enough for us.

is both not a proof and a weak argument. The Integer Square Root algorithm is a fundamental operation, and the proof of validity should be clear. Yes, the proof may require some knowledge of mathematics, but it's not particularly advanced. As noted above, I asked for a proof because if there was one, I wanted to see it because I could not construct one (as noted above).

We're not asking you for a proof

Yes, I understand. I wanted to be verbose and then wanted to iterate the discussion to make it concise yet understandable.

As to

we're trying to iterate the implementation before approving, that's the way we work and we generally come to better results this way.

Great. For the record, I iterated a lot on this algorithm as well, so if you are going to make changes to the code, would you please include Max, Mean, Median, Standard Deviation over 2048 samples? This allows for a systematic investigation of updates to the code and allows for an easy comparison. You could, for example, use the 2048 samples from this repo. The entire repo is devoted to one topic: provably correct efficient integer square root algorithms in Solidity. Please also ensure that all changes don't negate the proof of convergence.

Iterating comments are completely different. Yes, I understand not everyone "speaks math", but the comments should paint a clear picture of the algorithm and different parts.

Algorithm Overview: The algorithm has 3 parts:

  1. For 2**(2e-2) <= a < 2**(2e) (equivalently, 2**(e-1) <= sqrt(a) < 2**e)), set result == 2**(e-1) + 2**(e-2). With this initialization, we have abs(result - sqrt(a)) <= 2**(e-2).
  2. Perform Newton iterations. Because of the specified initialization, only 6 are required until Newton error bounds ensure that the error is less than or equal to 1. This means result == Isqrt(a) or result == Isqrt(a) + 1.
  3. Return result if result**2 <= a; otherwise, return result - 1.

@ernestognw
Copy link
Member

to iterate the discussion

We'd appreciate short and straight to the point discussions.

Iterating comments are completely different. Yes, I understand not everyone "speaks math", but the comments should paint a clear picture of the algorithm and different parts.

I think we've asked you to help with that and yet you kept expanding the explanation.

Great. For the record, I iterated a lot on this algorithm as well, so if you are going to make changes to the code, would you please include Max, Mean, Median, Standard Deviation over 2048 samples?

We'll keep iterating this. I'm sorry but this is an open source project and not an extension of your research. We'll dig into potential optimizations but I don't think we have time to give you a report with the samples.

@ernestognw
Copy link
Member

I tested the original version by @chgorman vs the version @Amxx is proposing:

FOUNDRY_FUZZ_RUNS=10000 forge snapshot --via-ir --optimizer-runs 1000000000 --use 0.8.24 --match-test testSqrt -vvvv

// chgorman
MathTest:testSqrt(uint256,uint8) (runs: 10000, μ: 4680, ~: 4857)

// Amxx
MathTest:testSqrt(uint256,uint8) (runs: 10000, μ: 4883, ~: 5012)

// master
MathTest:testSqrt(uint256,uint8) (runs: 10000, μ: 4925, ~: 5054)

Both @chgorman and @Amxx versions seem like an improvement over the current implementation. However, we discussed how much the following block makes readability more difficult:

if (a >= uint256(type(uint128).max) ** 2) {
    return type(uint128).max;
}

And we concluded that we won't merge something we're not comfortable with even if it's an improvement in gas. In this case, the gas improvement is around ~245 gas units using the original proposal (~200 vs @Amxx's). I still think the PR is good, but I'll leave the decision of whether to keep the original proposal or use the current version to @Amxx

ernestognw
ernestognw previously approved these changes Feb 15, 2024
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again. Feel free to iterate in comments.

@Amxx
Copy link
Collaborator

Amxx commented Feb 15, 2024

I tested the original version by @chgorman vs the version @Amxx is proposing:

FOUNDRY_FUZZ_RUNS=10000 forge snapshot --via-ir --optimizer-runs 1000000000 --use 0.8.24 --match-test testSqrt -vvvv

// chgorman
MathTest:testSqrt(uint256,uint8) (runs: 10000, μ: 4680, ~: 4857)

// Amxx
MathTest:testSqrt(uint256,uint8) (runs: 10000, μ: 4883, ~: 5012)

// master
MathTest:testSqrt(uint256,uint8) (runs: 10000, μ: 4925, ~: 5054)

Both @chgorman and @Amxx versions seem like an improvement over the current implementation. However, we discussed how much the following block makes readability more difficult:

if (a >= uint256(type(uint128).max) ** 2) {
    return type(uint128).max;
}

And we concluded that we won't merge something we're not comfortable with even if it's an improvement in gas. In this case, the gas improvement is around ~245 gas units using the original proposal (~200 vs @Amxx's). I still think the PR is good, but I'll leave the decision of whether to keep the original proposal or use the current version to @Amxx

This estimation is with fuzzing numbers in which range? If the inputs are uint128 (don't go all the way to uint256) I don't get the same differences, because the number I put in are never taking the "shortcut" in @chgorman version. I think that realistic usage will probably not be using that shortcut either, as its very rare to see such large number manipulated for math (for cryptography we don't use this function anyway), and I think we should optimize "smaller" values

@Amxx
Copy link
Collaborator

Amxx commented Feb 15, 2024

Feel free to iterate in comments.

I will. I honestly think the current comments are very difficult to read. For those that read french, all the proofs, including convergence, fit into <1 page. We should aim for that level of clarity and simplicity.

result = (3 * result) >> 1;

// We define the error as ε = result - sqrt(a). Then we know that
// result = 2**e−1 + 2**e−2, and therefore ε0 = 2**e−1 + 2**e−2 - sqrt(n),
Copy link
Collaborator

@Amxx Amxx Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference of notation between this and the PDF.

Here we do result = 2**e <= sqrt(a) < 2**(e+1))
The PDF does 2**(e−1) ≤ √n < 2**e (B.2)

This causes much confusion.

result = (result + a / result) >> 1; // ε_5 := result - sqrt(a) <= 2**(e-72)
result = (result + a / result) >> 1; // ε_6 := result - sqrt(a) <= 2**(e-144)

// After 6 iterations, the precision of e is already above 128 (i.e. 144). Meaning that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision of e

We computed e at step 0 (using all the ifs). Here we are dealing with the precision of x_n

@Amxx
Copy link
Collaborator

Amxx commented Feb 15, 2024

I've updated the comments in #4902. They are longer, but IMO explain everything clearly without eluding any point or refering to any external ressource. It includes minor variable renaming (for clarity) but no logic change. I'd be willing to merge that version.

If @ernestognw and @chgorman can review it, we can then forward this PR to include the commits in 4902, and merge here.

@Amxx
Copy link
Collaborator

Amxx commented Feb 15, 2024

Iterating comments are completely different. Yes, I understand not everyone "speaks math", but the comments should paint a clear picture of the algorithm and different parts.

Actually, that is my goal when iterating. IMO no commit in the history of this PR includes a "clear picture"

@chgorman
Copy link
Contributor Author

I'm not going to be able to devote time to look at this for at least 36 hours.

@ernestognw
Copy link
Member

Just reviewed and left my comments in #4902, I think we can move those comments here and merge.

@Amxx Amxx requested a review from ernestognw February 16, 2024 09:17
Copy link
Member

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially felt the comments were too much, and that we shouldn't expect every developer to read the math. However, we agreed the proof is still not immensely large and we will relax the comment length when adding proofs to the Math library, because that also helps auditability.

In regards to @Amxx's comments in regards of gas: Avoiding the case extra if (a >= uint256(type(uint128).max) ** 2) by bounding the input still gives me more gas consumption in average.

It seems expected because type(uint128).max) ** 2 is leaving a whole range of numbers that are increasing the average. I previoiusly thought that we might not want to optimize for a uniform distribution if the values are frequently in the low uint256's (for example), and the most popular cases I know (eg. Uniswap V2) are in the low uint256 order.

We appreciate the work @chgorman, in fact, challenging the code to be provable has gotten us to this conclusion, and it's a valuable contribution.

Looking good, merging.

@ernestognw ernestognw merged commit 140d66f into OpenZeppelin:master Feb 16, 2024
30 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants