Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated rollback deployments can create excess processes #1858

Open
heyjcollins opened this issue Sep 23, 2020 · 1 comment
Open

Repeated rollback deployments can create excess processes #1858

heyjcollins opened this issue Sep 23, 2020 · 1 comment

Comments

@heyjcollins
Copy link

heyjcollins commented Sep 23, 2020

While trying to get a large number of revisions on an app, I repeatedly rolled back and forth between 2 revisions. Eventually my deployments started to fail. When @cwlbraa and I investigated further, we found:

  1. There were 100 Deployments associated with the app
  2. There were ~3100 "web" processes associated with the app
  3. The deployment updater logs were not failing in any obvious way
  4. There were 100 Revisions associated with the app

We suspect PruneExcessAppRevisions eventually deleted revisions 1 and 2 (each app can have 100 revisions at most), but that doesn't explain how we got thousands of web processes.

Context

I was using "dora" as the app in a single-instance configuration.
Then executed rollbacks in rapid succession
Rollbacks, pushes, app summary requests failed

capi slack thread

Steps to Reproduce

  1. push dora 3x
  2. run a script to rollback repeatedly (zsh I used script below)
function ten-thousand-revisions-dora(){
  i=0
  while [ $i -lt 10000 ]
  do
    if [[ $i%2 -lt 1 ]]; then
      cf rollback dora --revision 2 -f
    else
      cf rollback dora --revision 1 -f
    fi
    i=$(($i + 1))
  done
}

after running the script again (slightly modified from above since revisions 1 and 2 had been pruned, we see the following error states:

This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3175 for app dora in org o / space s as admin...

OK

This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3176 for app dora in org o / space s as admin...

OK

This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3175 for app dora in org o / space s as admin...

memory quota_exceeded
FAILED
This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3176 for app dora in org o / space s as admin...

Unable to rollback. The code and configuration you are rolling back to is the same as the deployed revision.
FAILED
This command is in EXPERIMENTAL stage and may change without notice

Rolling back to revision 3175 for app dora in org o / space s as admin...

memory quota_exceeded
FAILED

Expected result

Either:

  1. My deployments fail "gracefully" when the cluster runs out of resources
  2. I should hit a limit on deployments/app

And:

  1. There should never be more Processes than Deployments on the app.

Current result

~3000 revisions were created before the script started failing on the front end
subsequent attempts to push different apps failed with "Insufficient Resources: insufficient resources" errors
CLI cf apps and cf app dora took hours to return results.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/174940465

The labels on this github issue will be updated when the story is started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants