Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU burn / slow >= v0.14.0 #3881

Open
guyguy333 opened this issue Oct 22, 2023 · 6 comments
Open

CPU burn / slow >= v0.14.0 #3881

guyguy333 opened this issue Oct 22, 2023 · 6 comments
Assignees
Labels
bug Something isn't working needs-response triage

Comments

@guyguy333
Copy link

guyguy333 commented Oct 22, 2023

Describe the bug

Running Hashicorp Boundary in a container (at least) burns CPU and each API request from UI is slow (~1s).

Screenshot 2023-10-22 at 11 42 22

To Reproduce

Steps to reproduce the behavior in dev mode:

  1. Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
  2. Install Docker
  3. Fix permissions for the test (horrible fix, but it's just to reproduce) : sudo chmod 777 /var/run/docker.sock
  4. Run boundary dev mode : docker run --net=host -v /var/run/docker.sock:/var/run/docker.sock --rm hashicorp/boundary:latest dev
  5. Boundary starts and then CPU burns

Steps to reproduce the behavior in dev mode with external Postgres 15 (another machine):

  1. Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
  2. Install Docker
  3. Run boundary dev mode : docker run --rm hashicorp/boundary:latest dev -database-url=XXXXX
  4. Boundary starts (but is really slow, it's about few minutes) and then CPU burns

Steps to reproduce the behavior in production mode with external Postgres 15 (another machine):

  1. Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
  2. Install Docker
  3. Run boundary mode with production config enabling worker
  4. Boundary starts (but is really slow) and then CPU burn. Each API request from UI is about 1s

However, in production mode, I found that disabling worker, stops CPU burning but API requests are still really slow from UI.

If I run in dev mode on my laptop (Apple M1 PRO) without Docker, I don't have CPU burn issue and everything is fast. If I run on VMs without container, I also have the issue (Linux Ubuntu 22.04 LTS related issue ?)

It has been tested on two different cloud providers for VMs, both running Ubuntu 22.04 LTS

I don't have the CPU burn issue with v0.13.1 but I've the issue with v0.14.0 and v0.14.1. However, I've really slow API requests and so slow UI with v0.13.1.

Expected behavior

Hashicorp boundary is responsive and no longer burn CPU.

Additional context
Add any other context about the problem here.

@guyguy333 guyguy333 added the bug Something isn't working label Oct 22, 2023
@guyguy333 guyguy333 changed the title CPU burn / slow (v0.14.1) CPU burn / slow >= v0.14.0 Oct 22, 2023
@elimt elimt self-assigned this Oct 23, 2023
@elimt
Copy link
Member

elimt commented Oct 23, 2023

Looked into the issue and it seems to be caused by a long-running background check which was using a lot of CPU.

We have started working on a fix for this issue: #3884

@elimt
Copy link
Member

elimt commented Nov 2, 2023

@guyguy333 The latest Boundary 0.14.2 release has the the fix to address this issue. Let me know if that helps address the issue.

@guyguy333
Copy link
Author

Thanks a lot @elimt, I can confirm it solved CPU burning issue.

However, I still have huge slowness in production mode with latency-ms ~900ms for most requests.

{"id":"7JE1s4IfXH","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.165161,"request_info":{"id":"gtraceid_qEvPzHFlfTv5afaUN18p","method":"GET","path":"/assets/chunk.143.00d6b02cc76cee2b78af.css","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.173461747Z","status":200,"stop":"2023-11-03T10:25:05.173626908Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.17368055Z"} {"id":"oBfB2hT88d","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":87.068323,"request_info":{"id":"gtraceid_m9aY7XMw4e7BQznWx0Rh","method":"GET","path":"/assets/admin-af689c1f154f54624ca33cae48e25b28.js","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.172516341Z","status":200,"stop":"2023-11-03T10:25:05.259584645Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.259649468Z"} {"id":"Ov4VjUvk8a","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":775.192092,"request_info":{"id":"gtraceid_cz5gffR4UiJdnGvNu1RN","method":"POST","path":"/v1/auth-methods/amoidc_e3nnv6V9iW:authenticate","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:04.649536737Z","status":200,"stop":"2023-11-03T10:25:05.424728829Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.424770388Z"} {"id":"UlC7nw8Jqc","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.082395,"request_info":{"id":"gtraceid_u45HSFxDh04IwApjfBXB","method":"GET","path":"/metadata.json","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.38484681Z","status":200,"stop":"2023-11-03T10:25:06.384929205Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.384938022Z"} {"id":"6fwtp8bCs8","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":972.740802,"request_info":{"id":"gtraceid_ykSoQDOJxeyDsYUV1yGq","method":"GET","path":"/v1/scopes","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.629973416Z","status":200,"stop":"2023-11-03T10:25:06.602714228Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.602746249Z"} {"id":"gW9n1x0HW4","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":989.508232,"request_info":{"id":"gtraceid_v7EVJ5z8C832jsvq1iBF","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.638015536Z","status":200,"stop":"2023-11-03T10:25:06.627523798Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.627539117Z"} {"id":"0nAfUIaAlM","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":991.043242,"request_info":{"id":"gtraceid_ZeNqL68w8Ep6k89PbiiS","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.637693848Z","status":200,"stop":"2023-11-03T10:25:06.62873713Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.62877961Z"} {"id":"dmo3UDeyhx","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":1034.120546,"request_info":{"id":"gtraceid_vSNloLY7eOOyGeMICnkM","method":"GET","path":"/v1/auth-tokens/at_qeTkERl3Kn","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.400616576Z","status":200,"stop":"2023-11-03T10:25:07.434737122Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:07.434803057Z"} {"id":"wxJ0JH4n36","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":921.721597,"request_info":{"id":"gtraceid_FTcYuCbTBy9plwjom4hk","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.709372454Z","status":200,"stop":"2023-11-03T10:25:07.631094051Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:07.631149856Z"} {"id":"TxhfbhHCK4","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":817.078066,"request_info":{"id":"gtraceid_GvSTQVGSLlW9dzh1Zd1y","method":"GET","path":"/v1/scopes/o_HdUl3IdO2r","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:11.130487691Z","status":200,"stop":"2023-11-03T10:25:11.947565747Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:11.947604821Z"} {"id":"S0K7cZqppj","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":829.705026,"request_info":{"id":"gtraceid_3Lmc0JvTAItSFaRuTUG5","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:11.121902266Z","status":200,"stop":"2023-11-03T10:25:11.951607302Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:11.951625405Z"} {"id":"6UDn6LHuZp","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":794.742167,"request_info":{"id":"gtraceid_OdsIzgdP7IqEUAI98Lzc","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:12.027060117Z","status":200,"stop":"2023-11-03T10:25:12.821802244Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:12.821868149Z"} {"id":"zVfv7ZhDK0","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":837.209621,"request_info":{"id":"gtraceid_OovTBM7tqQyYfLWBOuoN","method":"GET","path":"/v1/scopes?scope_id=o_HdUl3IdO2r","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:12.027403876Z","status":403,"stop":"2023-11-03T10:25:12.864613497Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:12.864644635Z"} {"id":"3I1Z01FrTl","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":917.06497,"request_info":{"id":"gtraceid_O8XAuCtUbexM1ZrtKIcv","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:14.604024804Z","status":200,"stop":"2023-11-03T10:25:15.521089724Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:15.521150309Z"} {"id":"2R8E3gK70f","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":968.21363,"request_info":{"id":"gtraceid_IzyR4Go2cX603TWf9Q4X","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:14.604082573Z","status":200,"stop":"2023-11-03T10:25:15.572296163Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:15.572363149Z"} {"id":"q0EfrkM7rA","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":998.116061,"request_info":{"id":"gtraceid_9Qo8NLOuhNR3chIFLRjh","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:15.575405335Z","status":200,"stop":"2023-11-03T10:25:16.573521366Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:16.573599573Z"} {"id":"Eaj6I2Jil7","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.576419,"request_info":{"id":"gtraceid_JWkXTK7wGREHnh3PJu4A","method":"GET","path":"/","client_ip":"100.64.5.87"},"start":"2023-11-03T10:25:31.438582075Z","status":200,"stop":"2023-11-03T10:25:31.439158494Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:31.439219119Z"} {"id":"7zSXs64jqr","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.97378,"request_info":{"id":"gtraceid_FipMaGN8ciNNg7zdyWMi","method":"GET","path":"/","client_ip":"100.64.5.87"},"start":"2023-11-03T10:25:31.438571835Z","status":200,"stop":"2023-11-03T10:25:31.439545605Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:31.439597793Z"}

@elimt
Copy link
Member

elimt commented Nov 3, 2023

@guyguy333

  1. Could you please provide details about your setup?
  2. Could you also share your system specs? Boundary has a page about system requirements

@AdamBouhmad
Copy link
Contributor

Hey @guyguy333 -- are you still encountering this issue?

@guyguy333
Copy link
Author

Hi @AdamBouhmad, yes I still have the issue. App is slow and request latency is about 900ms, resulting is really slow UI.

To answer @elimt and provide more details, I run container on a K8S cluster (hashicorp/boundary:0.14.3). There is no CPU or memory limit for this container.
I setup these env vars: BOUNDARY_POSTGRES_URL, HOSTNAME (= boundary) and VAULT_TOKEN.
Server is started using boundary server -config /boundary/config.hcl

My config:

disable_mlock = true
log_format    = "json"

controller {
  name        = "kubernetes-controller"
  description = "Boundary Controller"
  public_cluster_addr = "boundary-cluster.example.com:443"

  database {
    url = "env://BOUNDARY_POSTGRES_URL"
    max_open_connections = 10
    max_idle_connections = 10
  }
}

# Ingress TCP Route
worker {
  name              = "kubernetes-worker"
  description       = "Boundary Worker"
  address           = "localhost"
  initial_upstreams = ["boundary:9201"]
  public_addr       = "boundary-worker.example.com:443"
}

# Ingress
listener "tcp" {
  address              = "0.0.0.0"
  purpose              = "api"
  tls_disable          = true
  cors_enabled         = true
  cors_allowed_origins = ["https://boundary.example.com"]
}

listener "tcp" {
  address       = "0.0.0.0"
  purpose       = "cluster"
  tls_cert_file = "/certs/tls.crt"
  tls_key_file  = "/certs/tls.key"
}

listener "tcp" {
  address       = "0.0.0.0"
  purpose       = "proxy"
  tls_cert_file = "/certs/tls.crt"
  tls_key_file  = "/certs/tls.key"
}

kms "transit" {
  purpose            = "root"
  address            = "https://vault.example.com"

  // Key configuration
  key_name           = "boundary-root"
  mount_path         = "transit/"
}

kms "transit" {
  purpose            = "recovery"
  address            = "https://vault.example.com"

  // Key configuration
  key_name           = "boundary-recovery"
  mount_path         = "transit/"
}

kms "transit" {
  purpose            = "worker-auth"
  address            = "https://vault.example.com"

  // Key configuration
  key_name           = "boundary-worker-auth"
  mount_path         = "transit/"
}

Certs are mounted from a K8S secret in /certs.

Host machine uses AMD64 arch and has 4 cores and 32Gb RAM

Currently, there is no load, only me testing solution to have something stable before opening it.

Thanks for considering the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-response triage
Projects
None yet
Development

No branches or pull requests

4 participants