Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ci): Increase the Google Cloud instance sshd connection limit #5365

Closed
wants to merge 8 commits into from

Conversation

teor2345
Copy link
Contributor

@teor2345 teor2345 commented Oct 9, 2022

Motivation

SSH connections to Google Cloud test instances are failing with:

ERROR: (gcloud.compute.start-iap-tunnel) Error while connecting [4010: 'destination read failed'].
kex_exchange_identification: Connection closed by remote host

https://github.com/ZcashFoundation/zebra/actions/runs/3203311358/jobs/5234000346#step:6:106

Closes #5362

Reference

Weirdly, none actually try to authenticate to open a session.

Some spiders and services like Shodan scans public ipv4 addresses for open services, e.g. salt masters, ftp servers, RDPs, and also SSH services. These spiders usually only connect to the services without doing any valid authentication steps.

https://serverfault.com/questions/1015547/what-causes-ssh-error-kex-exchange-identification-connection-closed-by-remote/1015554#1015554

I got this error when using docker command with remote host
docker -H ssh://user@server compose up
after some digging i found on my remote server in auth logs (/var/log/auth.log) this:
Aug 8 14:51:46 user sshd[1341]: error: beginning MaxStartups throttling
Aug 8 14:51:46 user sshd[1341]: drop connection #10 from [some_ip]:32992 on [some_ip]:22 past MaxStartups

https://stackoverflow.com/questions/67000681/kex-exchange-identification-connection-closed-by-remote-host/73292521#73292521

MaxStartups
Specifies the maximum number of concurrent unauthenticated connections to the SSH daemon.
Additional connections will be dropped until authentication succeeds or the LoginGraceTime
expires for a connection.

https://man7.org/linux/man-pages/man5/sshd_config.5.html

Solution

Allow up to 500 unauthenticated connections to Google Cloud instances.
This tolerates a larger number of port scanning bots.

Review

These errors are blocking most other PR merges.

Reviewer Checklist

  • Will the PR name make sense to users?
    • Does it need extra CHANGELOG info? (new features, breaking changes, large changes)
  • Are the PR labels correct?
  • Does the code do what the ticket and PR says?
  • How do you know it works? Does it have tests?

Follow Up Work

If this change works, we should:

@teor2345 teor2345 requested a review from a team as a code owner October 9, 2022 23:52
@teor2345 teor2345 requested review from gustavovalverde and removed request for a team October 9, 2022 23:52
@teor2345 teor2345 self-assigned this Oct 9, 2022
@github-actions github-actions bot added C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels Oct 9, 2022
@teor2345 teor2345 added A-devops Area: Pipelines, CI/CD and Dockerfiles P-Critical 🚑 I-integration-fail Continuous integration fails, including build and test failures I-cost Zebra infrastructure costs and removed C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG labels Oct 9, 2022
@github-actions github-actions bot added the C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG label Oct 10, 2022
@teor2345
Copy link
Contributor Author

Finally got this working:

root 667 0.0 0.0 6324 4800 ? Ss 04:17 0:00 sshd: /usr/sbin/sshd -D -e [listener] 0 of 10-100 startups
..
root 667 0.0 0.0 6324 4680 ? Ss 04:17 0:00 sshd: /usr/sbin/sshd -D -e [listener] 0 of 500-500 startups

https://github.com/ZcashFoundation/zebra/actions/runs/3216832174/jobs/5259141110#step:9:507

@teor2345
Copy link
Contributor Author

One test had a bug, but I think it is caused by the compute-ssh action changes in PR #5330:

client_loop: send disconnect: Broken pipe

https://github.com/ZcashFoundation/zebra/actions/runs/3216898611/jobs/5259351189#step:6:6089

Previously, this broken pipe error was handled using a tee option. We can't adjust the same option on ssh until we revert PR #5330. (And we might not even need to if we do the revert.)

@gustavovalverde
Copy link
Member

Superseded by #5367

@teor2345
Copy link
Contributor Author

Superseded by #5367

I think this PR fixes some of the ssh connection issues we were having before #5330, but I'll need to rewrite it to use the original ssh command syntax.

@teor2345 teor2345 marked this pull request as draft October 10, 2022 19:46
@teor2345
Copy link
Contributor Author

Ah, I see you covered that in the new PR

@teor2345 teor2345 closed this Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-devops Area: Pipelines, CI/CD and Dockerfiles C-bug Category: This is a bug C-trivial Category: A trivial change that is not worth mentioning in the CHANGELOG do-not-merge Tells Mergify not to merge this PR I-cost Zebra infrastructure costs I-integration-fail Continuous integration fails, including build and test failures
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Google Cloud ssh-compute action: "destination read failed"
2 participants