Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2/ready #996

Closed
wants to merge 27 commits into from
Closed

h2/ready #996

wants to merge 27 commits into from

Conversation

howardjohn
Copy link
Member

howardjohn and others added 27 commits April 22, 2024 00:00
* Refactor connection tracker

* Track outbound connections as well

* fix test
Bumps [rustls](https://github.com/rustls/rustls) from 0.23.4 to 0.23.5.
- [Release notes](https://github.com/rustls/rustls/releases)
- [Changelog](https://github.com/rustls/rustls/blob/main/CHANGELOG.md)
- [Commits](rustls/rustls@v/0.23.4...v/0.23.5)

---
updated-dependencies:
- dependency-name: rustls
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This is a super security sensitive codepath, and mostly and experiment.
Make it opt in so we don't stumble into a CVE when we ship beta.
* metrics: Don't log 'hbone_target' as a string.

The other SocketAddr fields (src.addr and dst.addr) don't log as
strings, so make them all consistent.

* metrics: Fix linter errors from 'hbone_target' commit.
* metrics: make traffic metrics faster

Before, each r/w we lookup the metric and increment it.

Now, we look it up once and then just increment. The increment is just
Atomic.increment, which on x86 is the same as a simple `add`, which is
very fast (measured 1ns, but presumably its so fast benchmarking isn't
meaningful).

Since this is on the hot path, this is meaningful.

Given this, I don't think we will ever need to do
istio#927 (comment).

Also, addressed
istio#927 (comment) and
cleaned up inbound which broke a function up into 2 which made things
more complex, not simpler.

* Drop the return type
Enhance the `Forwarded` header we already send to include `host` as part
of the header.

This is used to guess the service for inbound telemetry (as an extra
input to our existing heuristic), and also reported in the connection
dump.
* test: run `namespaced` tests without root

This gives full coverage, without requiring root. It also prevents
issues around failure to cleanup network namespaces, etc, and generally
allows us to mess with the environment a lot more than we were doing
now.

* try to create
* rbac: log unknown policies for debugging purposes

* Update src/state.rs

Co-authored-by: Ben Leggett <854255+bleggett@users.noreply.github.com>

---------

Co-authored-by: Ben Leggett <854255+bleggett@users.noreply.github.com>
* Cleanup nesting

* e2e tests: use inpod mode more

* Assert access logs

* wip

* better logs

* WIP admin tasks

(cherry picked from commit ba9c6c9)

* wip

* fix

* Revert "WIP admin tasks"

This reverts commit 0ca900b.

* fix

* re-enable

* cleanup

* more cleanup, need to undo the prefix

* Revert "Cleanup nesting"

This reverts commit cfb81b0.

* Add more tests

* Malicious calls test

* mismatch test

* policy test

* vip test

* cleanups

* fmt

* lint scripts
We only did this for inbound. This double check provides a stronger
guarantee than IP address matching
* Attempt at smarter HBONE pooling between ztunnels

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Lints

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* lints 2

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Hmm

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fixup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fixup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* More comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* cleanup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* fixup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Clean

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fix jemalloc

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* WIP: move out of proxyinfo

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Evict pooled conns after $INTERVAL

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Update src/proxy/pool.rs

Co-authored-by: Ian Rudie <ilrudie@gmail.com>

* Evict pooled conns after $INTERVAL

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* For now, just do the foolproof collision check

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Don't be silly

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Naming, review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Tidy Arcs+drains

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Cleanups

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Format

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Use the fancy lockless outer map, drop realm-io

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Cleanup comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fix outdent (review comment)

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fixups/review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* resync

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Droptests

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* fix testhang

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* add smarter evict test

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Interesting failure

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* No, it's not

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Make this a bit simpler

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Separate out the connspawner

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Tidy logging a bit

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Add serverside keepalive

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* fixup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Just for kicks

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* D'oh - use mthread runtime for tests

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fix none race

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Propagate connection establish errors

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Cleanup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Work around local test server getting overloaded

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Move the rest to multi_thread, chill out on iterations, work around test
rig flakes

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Tidy comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* lints

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Clarify comment

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

---------

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Co-authored-by: Ian Rudie <ilrudie@gmail.com>
(cherry picked from commit c68413be4387680372df53fb496b50132fb6fef3)
* performance: reduce and test size of futures

Prior to this PR, each outbound connection future was 22k. This is
really big.

The reason behind this is how Rust sizes Futures; the stack size is
allocated based on the maximum required size through any branch. This
means even obscure code paths can bloat up the entire future for
everyone.

This PR optimizes a bunch of these. This is done in a few ways:
* Box::pin expensive futures. While we have to allocate these when we
  hit them, of course, when we don't hit these codepaths they are now
"free". We have a lot of uncommon expensive code paths. Especially HBONE
vs passthrough; hbone is way more expensive. But also on demand XDS,
DNS, etc
* Arc the Config which is large
* A variety of other micro-optimizations

A new `assertions` module is added to verify we do not regress

* Fix the upgrade drop issue

* fmt
We are storing these in a list and never removing. We don't use them, so
no need to store.
* refactor pool

* wip h2

* wip

* WIP

* first test passing

* test working reliably

* working except idle_eviction_with_persistent

* Ready to go

* Ready to go

* fixes

* lock
This is "required" per the docs. I don't think it really is, but good to
be safe and follow the docs
@howardjohn howardjohn requested a review from a team as a code owner May 1, 2024 22:46
@howardjohn howardjohn added the release-notes-none Indicates a PR that does not require release notes. label May 1, 2024
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label May 1, 2024
@istio-testing
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 1, 2024
@istio-testing
Copy link
Contributor

@howardjohn: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
test_ztunnel_release-1.22 600b7f0 link true /test test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@howardjohn howardjohn closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR needs to be rebased before being merged release-notes-none Indicates a PR that does not require release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants