Skip to content

Characterization

Paul Lorenz edited this page Jul 17, 2020 · 6 revisions

Overview

We need to characterize Ziti performance so that we can compare it against plain internet, against other technologies and against itself, so we can tell if we improving, maintaining or degrading performance over time.

Characterization scenarios will be done across three axis.

  • The model
    • This includes the numbers and interactions of services, identities and polices
  • The deployment
    • This includes the number and type of instances and in which regions they are deployed. It also includes if we are using tunnelers or native Ziti applications
  • The traffic
    • This includes the number of concurrent concurrent sessions, the amount of data sent and the number of iterations.

Models

Model Services Identities Edge Routers Service Policies Edge Router Policies Service Edge Router Policies
Baseline 1 1 1 1 1 1
Small 20 100 10 10 10 10
Medium 100 5,000 100 50 50 10
Large 200 100,000 500 250 250 100

For models with multiple edge routers, do we need to set the runtime up so only one is active, for consistency in test results (and also keeping testing costs down?)

For each policy from A <-> B, ensure we have at least

  1. an A with a policy which has all Bs
  2. a B with a policy which has all As
  3. an A with all policies
  4. a B with all policies
  5. Ensure that the A and B we test with are worst case: have access to maximum entities on both sides and are lexically sorted last to expose slowdowns in scans

Pure Model Tests

We can test the model in isolation outside the context of a full deployment/throughput/scale testing to ensure that the queries we need to do for the SDK will scale well. Ideally permission checks would O(1) so that the only non-constant would be service look-ups (since as a user has more services, that will naturally take more time).

This testing can be done locally, just exercising the APIs used by the SDK. If we can eliminate poor performance here that will let us focus on performance in the edge routers for the throughput and connection scale testing.

Results

baseline             | small                 | medium               | large
=====================|=======================|======================|=====================
Create API Session:  | Create API Session:   | Create API Session:  | Create API Session:
    Min  : 6ms       | 	Min  : 8ms           | 	Min  : 8ms          | 	Min  : 15ms
    Max  : 46ms      | 	Max  : 53ms          | 	Max  : 66ms         | 	Max  : 58ms
    Mean : 23.3ms    | 	Mean : 20.45ms       | 	Mean : 24.4ms       | 	Mean : 28.85ms
    95th : 45.9ms    | 	95th : 52.39ms       | 	95th : 65.6ms       | 	95th : 57.24ms
Refresh API Session: | Refresh API Session:  | Refresh API Session: | Refresh API Session:
    Min  : 0ms       | 	Min  : 0ms           | 	Min  : 0ms          | 	Min  : 0ms
    Max  : 0ms       | 	Max  : 0ms           | 	Max  : 0ms          | 	Max  : 0ms
    Mean : 0ms       | 	Mean : 0ms           | 	Mean : 0ms          | 	Mean : 0ms
    95th : 0ms       | 	95th : 0ms           | 	95th : 0ms          | 	95th : 0ms
Get Services:        | Get Services:         | Get Services:        | Get Services:
    Min  : 14ms      | 	Min  : 156ms         | 	Min  : 785ms        | 	Min  : 3521ms
    Max  : 17ms      | 	Max  : 187ms         | 	Max  : 848ms        | 	Max  : 3705ms
    Mean : 16ms      | 	Mean : 169.6ms       | 	Mean : 805.4ms      | 	Mean : 3620.5ms
    95th : 17ms      | 	95th : 187ms         | 	95th : 848ms        | 	95th : 3705ms
Create Session:      | Create Session:       | Create Session:      | Create Session:
    Min  : 6ms       | 	Min  : 8ms           | 	Min  : 18ms         | 	Min  : 2033ms
    Max  : 36ms      | 	Max  : 49ms          | 	Max  : 38ms         | 	Max  : 4951ms
    Mean : 15.75ms   | 	Mean : 20.35ms       | 	Mean : 24.05ms      | 	Mean : 3386.95ms
    95th : 35.9ms    | 	95th : 48.95ms       | 	95th : 37.9ms       | 	95th : 4944.65ms
Refresh Session:     | Refresh Session:      | Refresh Session:     | Refresh Session:
    Min  : 0ms       | 	Min  : 0ms           | 	Min  : 0ms          | 	Min  : 0ms
    Max  : 0ms       | 	Max  : 0ms           | 	Max  : 0ms          | 	Max  : 0ms
    Mean : 0ms       | 	Mean : 0ms           | 	Mean : 0ms          | 	Mean : 0ms
    95th : 0ms       | 	95th : 0ms           | 	95th : 0ms          | 	95th : 0ms

Model Performance Improvements

After denormalizing policy data and adding some query optimizations, results are much improved.

baseline               small                 medium                 large
========================================================================================
Create API Session:    Create API Session:   Create API Session:    Create API Session:
    Min  : 5ms             Min  : 6ms             Min  : 7ms            Min  : 16ms
    Max  : 29ms            Max  : 66ms            Max  : 73ms           Max  : 80ms
    Mean : 17.16ms         Mean : 18.69ms         Mean : 20.52ms        Mean : 29ms
    95th : 25ms            95th : 33ms            95th : 31.54ms        95th : 49.85ms
Refresh API Session:   Refresh API Session:  Refresh API Session:   Refresh API Session:
    Min  : 0ms             Min  : 0ms             Min  : 0ms            Min  : 0ms
    Max  : 0ms             Max  : 0ms             Max  : 0ms            Max  : 0ms
    Mean : 0ms             Mean : 0ms             Mean : 0ms            Mean : 0ms
    95th : 0ms             95th : 0ms             95th : 0ms            95th : 0ms
Get Services:          Get Services:         Get Services:          Get Services:
    Min  : 5ms             Min  : 12ms            Min  : 10ms           Min  : 48ms
    Max  : 25ms            Max  : 37ms            Max  : 63ms           Max  : 132ms
    Mean : 9.28ms          Mean : 23.02ms         Mean : 29.95ms        Mean : 73.9ms
    95th : 19ms            95th : 32.94ms         95th : 44ms           95th : 108.84ms
Create Session:        Create Session:       Create Session:        Create Session:
    Min  : 6ms             Min  : 7ms             Min  : 8ms            Min  : 14ms
    Max  : 23ms            Max  : 35ms            Max  : 41ms           Max  : 60ms
    Mean : 12.42ms         Mean : 12.86ms         Mean : 14.36ms        Mean : 29.4375ms
    95th : 22ms            95th : 25ms            95th : 28.19ms        95th : 52.55ms
Refresh Session:       Refresh Session:      Refresh Session:       Refresh Session:
    Min  : 0ms             Min  : 0ms             Min  : 0ms            Min  : 0ms
    Max  : 0ms             Max  : 0ms             Max  : 0ms            Max  : 0ms
    Mean : 0ms             Mean : 0ms             Mean : 0ms            Mean : 0ms
    95th : 0ms             95th : 0ms             95th : 0ms            95th : 0ms

Deployments

We should test with a variety of instance types, from t2 on up. Until we start testing, it will be hard to say what is needed. For high bandwidth applications you often need bigger instance types, even if the CPU and memory aren't required.

The controller should require smaller instances than the router, at least in terms of network use.

We shouldn't need to test deployment variations, such as tunneler vs SDK enabled application for all scenarios. We can pick one or two scenarios in order to find out if there are noticeable differences.

Traffic

There are some different traffic types we should test:

  1. IPerf, for sustained throughput testing. This can be done with various degrees of parallelism.
  2. Something like a web-service or HTTP server, for lots of concurrent, short lived connections, to get a feel for connection setup/teardown overhead.