Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch verification for range proofs #86

Open
wants to merge 233 commits into
base: main
Choose a base branch
from
Open

Batch verification for range proofs #86

wants to merge 233 commits into from

Conversation

oleganza
Copy link
Collaborator

@oleganza oleganza commented May 3, 2018

Overview

This adds batch verification to any number of range-proofs. The API allows mixing proofs with different sizes (in terms of both range size n and aggregation size m), and checks that the generators provided are enough to cover max(n*m).

Addresses #22 and replaces older PR #27.

Performance

Performance depends on Pippenger implementation (not merged yet): dalek-cryptography/curve25519-dalek#129

This PR does not have benchmarks for the batched verification yet.

The existing verify method is not implemented via generalized batch verification to avoid code duplication. This should add negligible overhead, but I haven't run the benchmarks since previous version #27 (where I did not find any statistically meaningful difference).

@oleganza pointed out that when the dealer's polynomial challenge `x` is zero,
the parties will leak secrets, and suggested this check.

Informal analysis of the protocol flow:

dealer -> party: position

party -> dealer: `ValueCommitment`

    The `ValueCommitment` cannot leak information to the dealer, since
    it's blinded independently of any dealer messages

dealer -> party: `ValueChallenge`

    Contains `y, z` challenges.

party -> dealer: `PolyCommitment`

    Contains `T_1`, `T_2`, which are blinded independently from any
    dealer messages and therefore can't leak information

dealer -> party: `PolyChallenge`

    Contains `x` challenge

party -> dealer: `ProofShare`

    Up till now, we know that the each of the party's messages can't
    reveal info because they're blinded independently of the dealer
    messages.  The paper notes that the blindings for the `l` and `r`
    vectors are chosen such that the prover can reveal `l(x), r(x)`
    for one challenge point `x \in \ZZ_p^{\times}` without revealing
    information, but if `x` is zero, then `\blinding t(x)` is just
    `z^2 \blinding v`, so the dealer could multiply by `z^2` and
    recover the blinding for the original commitment.

    However, if the party checks that `x` is nonzero, then `x \in
    \ZZ_p^{\times}`, the blinding factors are not annihilated, and
    the proof share does not leak information.

In the previous (non-MPC) version of the code, the `x` was computed by the
prover out of the proof transcript, and so it wasn't necessary to check that
`x` is nonzero (which occurs with probability `~ 2^{-252}`): as noted by AGL,
the instructions required to check this are more likely to fail than the check
itself.

Another concern is about replay attacks: a malicious dealer who was able to get
the party to apply the same `PolyChallenge` multiple times could use polynomial
interpolation to recover the coefficients of the `t` polynomial and reveal the
party's secrets.  (I think three points would be sufficient).

However, because our encoding of the protocol flow into affine types ensures
that each `Party` state can be used at most once, we are ensured that any
compilable instantiation of the MPC protocol is invulnerable to replay attacks,
since typechecking the program requires the compiler to prove that states are
not reused.
Prevent a malicious dealer from retrieving the party's secrets
@hdevalence
Copy link
Contributor

Instead of moving to a new verification module, I think it would be better to keep all of this as part of the rangeproof.

Currently, the Verification struct holds a bunch of temporary data. I'm wondering whether we can get away with not holding any temporaries, and have the verification struct act as a view into the proof data, computing things on the fly.

Related to the above is something we delayed thinking about earlier: whether the proof data should have compressed points, so that we don't have to decompress during deser and then recompress to hash during verification. This could impact whether we have to allocate temporaries, but I don't think we made a ticket for this.

@oleganza
Copy link
Collaborator Author

oleganza commented May 7, 2018

Related to the above is something we delayed thinking about earlier: whether the proof data should have compressed points, so that we don't have to decompress during deser and then recompress to hash during verification. This could impact whether we have to allocate temporaries, but I don't think we made a ticket for this.

@hdevalence Does it depend on upstream support for deserializing compressed points? Do we need to get creative and have DecompressedRistretto type that holds both compressed bytes (for fast compression) and the decompressed point? So that we can verify point validity during deserialization and also avoid extra work?

@hdevalence
Copy link
Contributor

We could just use compressed points in the struct, and call .map(|p_bytes| p.decompress()), with some extra iterator combinators to handle the None case.

@oleganza
Copy link
Collaborator Author

oleganza commented May 8, 2018

I rewrote the verification so that API does not expose the helper object with temporaries.

@hdevalence
Copy link
Contributor

Just wondering, what's the motivation for making the batch verification API mix all kinds of proofs in a single batch, instead of requiring that the batch contains only the same type of proof?

I'm worried that there's additional complexity and room for mistakes (both in the implementation and in the code that uses it) by mixing proof types, and I'm also not sure what kind of use-case that would be important for. Even supposing you had a protocol with different kinds of proofs, you could alternately create a batch of proofs of type 1, a batch of proofs of type 2, etc.

@cathieyun
Copy link
Member

The code looks good, but I am also wondering the same as above ^ - if we don't mix the different proof sizes, it seems like a some of the harder-to-follow logic can be removed (eg figuring out the max generator sizes, the split of n*m into n*m, 1, padding proofs).

@oleganza
Copy link
Collaborator Author

oleganza commented May 10, 2018

I think the mix of differently-sized (m) aggregated proofs is very realistic, while a mix of differently-sized ranges (n) is very unlikely.

Why varying m is very likely: bulletproofs create an incentive to aggregate proofs both for better privacy (coinjoin) and minimized cost. The process is interactive and we can expect all sorts of differently sized aggregations in the wild. Since all we care is n*m, supporting varying ns comes at no cost if we support varying ms. Batching all proofs in one giant expression instead of a grouping them by smaller ones should give non-trivial performance advantage since Pippenger scales sub-linearly and something hypothetical like "verification on a GPU" wants large amount of work and has higher-than-trivial latency to set up (per proof).

That said, I think the PR wouldn't be noticeably simpler if we fixed m and n. We'd still have to collect verifications scalars in vecs, and verification objects in one single vector because m wouldn't be statically defined anyway and we need to iterate over per-proof data several times: (1) combining scalars on static bases and (2,3) concatenating dynamic bases and scalars.

Re: n*m,1 - maybe we should simplify the generators API to take a single usize which will be filled in by n*m in the RP protocol?

@hdevalence
Copy link
Contributor

The point about varying aggregation sizes (m) is convincing, and that seems like something we should support in the future.

But as things are now, I think it actually violates the contract of the Generators struct, which contains generators for aggregated range proofs of size (m,n). That API isn't designed for varying m, and I think that the awkwardness about doing ad-hoc m*ns etc., is really a symptom of trying to use the Generators in a way that it wasn't designed for. If we would like to support varying m, I think we should redesign the Generators API around the idea that m can vary, and then update the rest of the code accordingly.

/// Proofs may use different ranges (`n`) or different number of aggregated commitments (`m`).
/// You must provide big enough view into generators (`gens`) that covers
/// the biggest proof
pub fn verify_batch<'a,'b,I,R,P,V>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where 'a, 'b are used?

/// You must provide big enough view into generators (`gens`) that covers
/// the biggest proof
pub fn verify_batch<'a,'b,I,R,P,V>(
proofs: I,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be much simpler if this just took an iterator of (&RangeProof, &[RistrettoPoint]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also consider using a type alias to make the roles more clear in the signature, or have the function take two iterators (one for proofs and the other for commitments).


// First statement is used without a random factor
let mut pedersen_base_scalars: (Scalar, Scalar) = (Scalar::zero(), Scalar::zero());
let mut g_scalars: Vec<Scalar> = iter::repeat(Scalar::zero()).take(nm).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let mut g_scalars = vec![Scalar::zero(), nm];

);
}

// First statement is used without a random factor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is cleaner not to unroll the first loop iteration, and just multiply all statements by a random factor. The code is simpler, the additional cost is minimal (and shrinks as the batch size grows).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, that's a stale comment. The first iteration is no longer unrolled.

/// Represents a deferred computation to verify a single rangeproof.
/// Multiple instances can be verified more efficient as a batch using
/// `RangeProof::verify_batch` function.
struct Verification {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the actual batchverify code above, each of these fields are used only once in the inner loop accumulating the batched verification scalars.

So, instead of having a seperate Verification struct that contains owned copies of the verification scalars for a given RangeProof, why not just have functions on the RangeProof struct that return the iterators for each of the fields of what is currently the Verification struct?

@goldenMetteyya
Copy link

Hi @oleganza what's the status of this PR? One suggestion is to have two apis one for same 'm' and one for a variable one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PTAL Ready for peer review T-api Change to the API T-optimization Making things faster/smaller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants