Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: support multi-dimensional testing #53

Closed
1 of 9 tasks
laurentsenta opened this issue Oct 10, 2022 · 24 comments
Closed
1 of 9 tasks

EPIC: support multi-dimensional testing #53

laurentsenta opened this issue Oct 10, 2022 · 24 comments

Comments

@laurentsenta
Copy link
Collaborator

laurentsenta commented Oct 10, 2022

As a libp2p maintainer, I want the ability to define test suites that combine many implementations, many muxers, many transports, etc. Defining and running these test suites should be simple, and the outcome should be clear. It should be easy to trigger these test suites before a release. It should be easy to display these results in a readable form on a website.

eta: 2022Q4

Tasks

Follow-up tasks

  • "optimize" by skipping versions libraries that are not used anymore (see notes below),
  • "optimize" using artifact caching,

Description

A high-level approach:

  1. first, we use versions resources file to generate "complex" test matrixes,
  2. then we use another resource (data or code) to produce the expected RTT matrix,
  3. then we generate the relevant composition file (as shown below), with the expected RTT as a parameter,
  4. then we iterate through these test cases and call testground run.

Configurations

Ideally, the libp2p team provides a resource file that contains the versions and their features:

[[groups]]
# go v0.42
GoVersion = '1.18'
Modfile = "go.v0.22.mod"
Selector = 'v0.42'
Implementation = 'go'
SupportedTransports = ["tcp", "quic", "webrtc"]
Muxer = ["yamux"]

[[groups]]
# go v0.22
GoVersion = '1.18'
Modfile = "go.v0.22.mod"
Selector = 'v0.22'
Implementation = 'go'
SupportedTransports = ["tcp", "quic"]
Muxer = ["mplex"]


[[groups]]
# rust v0.51
Libp2pVersion = 'v0.51.0'
Implementation = 'rust'
SupportedTransports = ["tcp", "webrtc"]

[[groups]]
# rust v0.47.0
Libp2pVersion = 'v0.47.0'
Implementation = 'rust'
SupportedTransports = ["tcp"]
Muxer = ["yamux", "mplex"]

And we'll create some way to have "meta-compositions" that can describe multiple tests and run many pairs together,

something like
(pseucode)

{ for every group }
    { if !group.SupportedTransports contains ENV.TESTED_TRANSPORT }
        { continue }
    { end }

    { if !group.SupportedTransports contains ENV.TESTED_MUXER }
        { continue }
    { end }

    [testground_instance]
    {}
{ endfor }

Called with

ENV.TESTED_TRANSPORT = "tcp"
ENV.TESTED_MUXER = "yamux"
testground run composition file

Related discussions and issues

@laurentsenta
Copy link
Collaborator Author

@laurentsenta
Copy link
Collaborator Author

laurentsenta commented Oct 11, 2022

note from sync w/ @marten-seemann (he was away for the chat yesterday and had related requests).

@mxinden @marten-seemann I have no strong feelings about this, but we might lose information between impromptu meetings. Do you feel the need to schedule some sort of "interop squad sync" where everyone joins at the same time?

Some more good ideas were brought up. A few notes:

Generating Matrix Parameters

The point of this epic is to let the libp2p team define a multi-dimensional test matrix,
and make sure it's maintainable. This is covered by the configuration above.

Using this configuration file, we can generate multiple tests, for example:

  • every pair that can communicate over quic.
  • every group of versions that can use TCP + yamux.

Implicitly this configuration represents compatible pairs. This means we have a matrix of implementation * versions * muxer * transports and each cell in this matrix is YES (interoperable) or NO (not interoperable).

An extension to this matrix is the RTT use case: @marten-seemann needs to add "expected RTTs" for each of these pairs. Each cell in the matrix will contain the number of RTTs an expected RTT in milliseconds.

A high-level approach:

  • first, we use the versions resources file (above) to generate the test matrix,
  • then we use another resource (data or code) to produce the expected RTT matrix,
  • then we generate the relevant composition file (as shown above), with the expected RTT as a parameter,
  • then we iterate through these test cases and call testground run.

I plan to iterate through solutions:

  • first, implement an interop suite that runs in CI and test pairs of instances, and generate "some" dashboards.
  • second, extend this feature to support the full matrix of muxer * transports * etc...
  • third, extends this feature to support the RTT matrix.

Unknown: this is focused on generating test pairs, how to express "I want to create a group with every webrtc-compatible implementation"? Do we need it?

Supported versions

The size of the test matrix will explode quickly. This should not be a problem to create the configuration files, but this will be a problem for the time to build & run test suites.

I believe EKS will solve some of this problem (with caching out of the box and having enough resources to enable parallelization), but we'll have to pick and choose which versions and combinations to test eventually.

One option is to test the most used version, see the diagram:

http://162.55.187.75:3000/d/CSQsORs7k/nebula?orgId=1&viewPanel=11
We have ~10 versions that cover ~60% of the network + "others"

We can use these metrics to enable/disable versions, probably something like:

  • X latest versions,
  • X most used versions

This solution sounds reasonable enough to keep this problem for later.

Config Structure

keep in mind: muxer + security applies only to TCP, so it might be helpful to use a nested structure instead of:

conf:
  SupportedTransports = ["tcp", "quic", "webrtc"]
  Muxer = ["yamux"]

Something like this might represent our matrix better, and it might be easier to expand.

conf:
   transports:
       TCP:
          muxer: yamux
       quic: true

We don't have to decide on this now (data is easy to transform), I recommend we stick to the flat structure for now.

@marten-seemann
Copy link
Contributor

An extension to this matrix is the RTT use case: @marten-seemann needs to add "expected RTT" for each of these pairs. Each cell in the matrix will contain an expected RTT in milliseconds.

Minor correction: It’s expected handshake duration, and it’s a dimensionless number: the number of RTTs. Calculating the actual duration will be done by the test plan, based on that number and the RTT that’s used for the run.

mxinden added a commit to mxinden/rust-libp2p that referenced this issue Oct 11, 2022
This is tracked on the [libp2p test-plans](libp2p/test-plans#44) see also
libp2p/test-plans#53.
@mxinden
Copy link
Member

mxinden commented Oct 12, 2022

keep in mind: muxer + security applies only to TCP, so it might be helpful to use a nested structure instead of:

We will face the same problem on the dimension of multiplexer-negotiation (via multistream-select or via security protocol) as this is only relevant for TCP.

I think expressing hierarchies within dimensions (e.g. tcp/noise/mplex, tcp/noise/yamux) is a valid option.

@mxinden
Copy link
Member

mxinden commented Oct 12, 2022

@mxinden @marten-seemann I have no strong feelings about this, but we might lose information between impromptu meetings. Do you feel the need to schedule some sort of "interop squad sync" where everyone joins at the same time?

I would suggest to continue doing these ad-hoc. That said, this is not a strong opinion.

@mxinden
Copy link
Member

mxinden commented Oct 12, 2022

Unknown: this is focused on generating test pairs, how to express "I want to create a group with every webrtc-compatible implementation"? Do we need it?

While long-term we will need this, I do think we should focus on point-to-point testing for now, i.e. two nodes (potentially 3 including a relay node for WebRTC browser-to-browser) instead of a group of nodes.

@julian88110
Copy link

I have created a google doc as a way to capture our test case requirements, filled in with the info I know, please take a look and if you can, fill in the part especially regarding JS, rust. We can migrate this doc to github once it is in a good shape. https://docs.google.com/document/d/1-akPPFW7kko9SkpedxXOV2foWJhnliELGd1WKn-0RFw/edit#

@John-LittleBearLabs
Copy link

I plan to iterate through solutions:

Does this mean that you're working on this, @laurentsenta ?

then we iterate through these test cases and call testground run.

... & ...

first, implement an interop suite that runs in CI and test pairs of instances, and generate "some" dashboards.

I had thought the matrix would be a single composition with some elaborate template structure. But it sounds like you're thinking of writing a higher-level script that orchestrates calls into testground?

If so, could/should this sort of matrix instead be a new feature added to testground itself, for reuse?

@laurentsenta
Copy link
Collaborator Author

laurentsenta commented Oct 17, 2022

@John-LittleBearLabs thanks for raising these questions,

Does this mean that you're working on this, @laurentsenta ?

I am working on the first step here: #55 which should add support for matrixes. I moved the list of tasks to the top of the issue description. @julian88110 is also working on the test matrix definition.

I had thought the matrix would be a single composition with some elaborate template structure. But it sounds like you're thinking of writing a higher-level script that orchestrates calls into testground?

You need both: with the composition, we can describe "many" test plans using templates and env parameters.
But we also need a way to run many compositions and gather their outcomes.

If so, could/should this sort of matrix instead be a new feature added to testground itself for reuse?

You're correct, we can and we should.
I shared a solution in #55 (comment).

Ideally, we validate this solution with the team and iterate on it for a while, then we can split:

The hard problem is making interop matrix maintainable, we can implement "anything" in testground as long as we don't leak interop-related matters into Testground APIs.

@laurentsenta
Copy link
Collaborator Author

(updated the task description with a clearer definition and added a few steps)

@julian88110
Copy link

julian88110 commented Oct 19, 2022

#Testground multi-dimensional test matrix

Tests are to be composed from the information extracted out of the resource files.

An example resource file entry may look like this:

[[groups]]


# go v0.42


GoVersion = '1.18'


Modfile = "go.v0.22.mod"


Selector = 'v0.42'


Implementation = 'go'


SupportedTransports = ["tcp", "quic", "webrtc"]


SupportedSecurityProtos = [“tls”, “noise”]


SupportedMuxers = ["yamux", “mplex”]

A test peer/host is customized by the following parameters:

testHost = Host(implementation, version, transport, securityProto, supportedMuxers)

A test case is composed by two or more test hosts:

testInstance = TestInstance(testHost-1, testHost2, …)

Go transport list:

  Go-libp2p-transports = [“TCP”, “QUIC”, “Webtransport”, “Websocket”]

Rust transport list:

  Rust-libp2p-transports = [“TCP”, “WebRTC”]

JS transport list:

 JS-libp2p-transports[“ToDo”]    

Test Matrix

Test matrix for libp2p multi dimensional tests. (Test cases should also be run with source/destination flipped)

Test case Source Host Run

Test

Destination Host Expected Res
Imp Ver Trans Sec Muxs Imp Ver Trans Sec Muxs Muxer RTT
1 go cur tcp tls ML1 X go 1 tcp tls ML1 M1 rtt-1
2 go cur tcp tls ML2 M2 rtt-1
3 go cur-1 tcp tls ML1 M1 rtt
4 go cur-2 tcp tls ML1 M1 rtt
5 go cur-3 tcp tls ML1 M1 rtt
6 go cur tcp noise ML1 X go cur tcp noise ML1 M1 rtt-1
7 go cur tcp noise ML-2 M1 rtt-1
8 go cur-1 tcp noise ML1 M1 rtt
9 go cur-2 tcp noise ML1 M1 rtt
10 go cur-3 tcp noise ML1 M1 rtt
11 go cur tcp noise ML1 X rust cur tcp noise ML1 M1 rtt
12 rust cur-1 tcp noise ML1 M1 rtt
13 rust cur-2 tcp noise ML1 M1 rtt
14 rust cur-3 tcp noise ML1 M1 rtt
15 go cur tcp tls ML1 X JS cur tcp noise ML1 M1 rtt
16 JS cur tcp noise ML-2 M1 rtt
17 JS cur-1 tcp noise ML1 M1 rtt
18 JS cur-2 tcp noise ML1 M1 rtt
19 JS cur-3 tcp noise ML1 M1 rtt
go cur QUIC - - X go cur QUIC - - -
go cur-1 QUIC - - -
go cur-2 QUIC - - -
go cur-3 QUIC - - -
go cur WebTransport - - X go cur WT - - -
go cur-1 WT - - -
go cur-2 WT - - -
go cur-3 WT - - -
go cur WS - - X go cur WS - - -
go cur-1 WS - - -
go cur-2 WS - - -
go cur-3 WS - - -
rust cur TCP noise - X JS cur TCP noise -
JS

ML1 = ["/yamux/1.0.0", "/mplex/6.7.0"] M1 = “/yamux/1.0.0” , M2 = “/mplex/6.7.0”

ML2 = ["/mplex/6.7.0", "/yamux/1.0.0"] ML3 = [“/mplex/6.7.0”]

@mxinden
Copy link
Member

mxinden commented Nov 4, 2022

Looking for an owner

While the IPDX team (i.e. @laurentsenta) works on the necessary support in testground/testground (see testground/testground#1493) I think there is value in us (libp2p team) to start working on our part, namely the generation of composition files based on a go.toml and rust.toml. See above and prove-of-concept in #55 (comment).

Any volunteers? @jxs or @julian88110 would either of you like to and have the capacity to own this?

@John-LittleBearLabs
Copy link

(libp2p team) to start working on our part, namely the generation of composition files based on a go.toml and rust.toml.

Whoever starts working on the official/permanent version - let me know so I can start basing my somewhat hacky version of working my webrtc test into this scheme on your work, rather than on PoC

@mxinden
Copy link
Member

mxinden commented Nov 7, 2022

Any volunteers? @jxs or @julian88110 would either of you like to and have the capacity to own this?

Discussed out of band. @jxs will own this.

@MarcoPolo
Copy link
Contributor

@mxinden / @jxs can you expand a bit? Is @jxs going to own the whole project or just the rust specific bits? I'm happy to be the DRI for the whole effort (full libp2p interop testing.)

@julian88110
Copy link

Thanks @mxinden @jxs and @MarcoPolo ! I will have some bandwidth to help once the integration test and related code refactor effort is done. We can discuss for details.

@mxinden
Copy link
Member

mxinden commented Nov 7, 2022

I see the following work streams here:

(Enumeration is for reference purpose, not to signal ordering.)

  1. Support the runs feature in composition.yml files. Owned by @laurentsenta. See tracking issue EPIC: Implement multiple runs per compositions testground/testground#1493.
  2. Generating composition.yml files based on rust.toml, go.toml and in the future js.toml files. Now owned by @jxs. See draft in Create a "simple" interop dashboard #55 (comment) written by @laurentsenta.
  3. Updating the corresponding ping/xxx implementations to support the various dimensions. Ideally just forwarding command line flags.
  4. Updating the various CI configurations in libp2p/test-plans and libp2p/{go,rust,js,nim}-libp2p.
  5. Implementing a visualization of the test matrix, partially tracked in Canonical interop tests & dashboard #62

Is @jxs going to own the whole project or just the rust specific bits?

The plan was for @jxs to own (2) for both rust and go, potentially js in the future.

I'm happy to be the DRI for the whole effort (full libp2p interop testing.)

Thus far I was the DRI. That said, I am happy to hand that over to you @MarcoPolo. Let me know. We should probably do a hand-over in some fashion.

@BigLep
Copy link
Contributor

BigLep commented Nov 7, 2022

Thanks @mxinden. This is helpful.

I want to minimize the amount of people involved, and I also know @mxinden is juggling a lot, so I am supportive if we're handing ownership over to others.

A few thoughts:

  1. This endeavor needs technical ownership (make sure we're building the right thing in the right way) and project management ownership (communication, tracking, coordinating).
  2. By default, given it is spanning all the libp2p implementations, I would expect @p-shahi to own the project management side at the minimum, but it's also ok if we're being intentional to have someone else take it.
  3. This initiative is the at the top of our roadmap so I want to make sure it doesn't slip between the cracks. These kinds of projects tend to take longer than expected which is why I want to be attacking it rather than passive. Otherwise I worry this will drag on for months. We need to make sure someone is responsible for a clear checklist of the task definitions, task owners, and task dependencies. The list above from Max is a good start. I think it makes sense to move the relevant portions to the issue description.

@p-shahi
Copy link
Member

p-shahi commented Nov 7, 2022

  1. Generating composition.yml files based on rust.toml, go.toml and in the future js.toml files. Now owned by @jxs. See draft in Create a "simple" interop dashboard #55 (comment) written by @laurentsenta.
  2. Updating the corresponding ping/xxx implementations to support the various dimensions. Ideally just forwarding command line flags.
  3. Updating the various CI configurations in libp2p/test-plans and libp2p/{go,rust,js,nim}-libp2p.

My preference is to make #61 the tracking issue for these efforts and tidy this Epic a bit. We can create child issues and assign to each team/DRI where necessary.

@MarcoPolo
Copy link
Contributor

@mxinden sounds good. Lets do a handoff at some point this week or next as your schedule allows. I think having @jxs handle point 2 is good. I can put myself as the fallback for things without owners and delegate appropriately.

@GlenDC
Copy link
Contributor

GlenDC commented Nov 13, 2022

FYI, the upcoming week I'm planning to start getting the js-libp2p ping test to work in this repo, on a new branch.
I'll link it once I have something that I can share.

It will be based on the "finished" work done in open PR: testground/testground#1502

@p-shahi p-shahi pinned this issue Nov 18, 2022
@p-shahi
Copy link
Member

p-shahi commented Nov 22, 2022

Thoughts on including plaintext in addition to TLS and Noise? I think libp2p/js-libp2p#1110 provides motivation to include it, at least as a "nice to have"

@mxinden
Copy link
Member

mxinden commented Nov 23, 2022

Including plaintext works for me. Just need to make sure we don't advertise it as a to-be-used-in-production protocol.

@p-shahi
Copy link
Member

p-shahi commented Feb 7, 2023

closing in favor of #61

@p-shahi p-shahi closed this as completed Feb 7, 2023
@p-shahi p-shahi unpinned this issue Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants