Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

Intermittent no allocations found after evaluation completed error from nomad waypoint server install #4557

Open
izaaklauer opened this issue Feb 27, 2023 · 1 comment
Labels
core/install jira Will add an Issue to Jira plugin/nomad

Comments

@izaaklauer
Copy link
Contributor

Describe the bug
Occasionally, for me, waypoint server install -platform=nomad fails with this error:

$ waypoint server install -platform=nomad -accept-tos -nomad-consul-service=false -nomad-host-volume=wp-server-vol -nomad-runner-host-volume=wp-runner-vol -nomad-consul-service=false
❌ Nomad allocation created
! Error installing server into nomad: no allocations found after evaluation completed

When this happens, I run nomad status, and see the allocation come up and run successfully. Looks like waypoint is checking too fast, and not noticing the alloc.

Local Nomad agent command and config:

$ cat nomad_agent.hcl 
data_dir  = "/tmp"

bind_addr = "0.0.0.0" # the default

advertise {
  # Defaults to the first private IP address.
  http = "127.0.0.1"
  rpc  = "127.0.0.1"
  serf = "127.0.0.1:5648" # non-default ports may be specified
}

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled       = true
  host_volume "wp-server-vol" {
    path      = "/tmp/nomad-wp-server-vol"
    read_only = false
  }
  host_volume "wp-runner-vol" {
    path      = "/tmp/nomad-wp-runner-vol"
    read_only = false
  }
}


$ nomad agent -config=/Users/izaaklauer/dev/nomad_agent.hcl

Steps to Reproduce

  • Run a local nomad server
  • Run waypoint server install
  • Note the periodic error.

Please include any waypoint.hcl files if applicable, as well as a
GitHub Gist of any relevant logs or steps to
reproduce the bug. Running waypoint commands with -v up to -vvv will
include any additional debugging info in the log.

Waypoint Platform Versions

  • Waypoint CLI Version: v0.11.0
  • Waypoint Server Platform and Version: nomad
  • Waypoint Plugin: nomad

Additional context

I traced this in the debugger, and caught the error here:

return "", fmt.Errorf("no allocations found after evaluation completed")

Looks like waitForEvaluation isn't working as expected. It's returning complete, but then when we go to check for the alloc, it isn't up yet.

Should an evaluation being complete garauntee that an alloc exists? If so, this is a bug in nomad. If not, we should retry a few times looking for the alloc.

func waitForEvaluation(
ctx context.Context,
s terminal.Step,
client *api.Client,
resp *api.JobRegisterResponse,
qopts *api.QueryOptions,
) (*api.Evaluation, *api.QueryMeta, error) {
for {
eval, meta, err := client.Evaluations().Info(resp.EvalID, qopts)
if err != nil {
return nil, nil, err
}
qopts.WaitIndex = meta.LastIndex
switch eval.Status {
case "pending":
s.Update("Nomad allocation pending...")
case "complete":
s.Update("Nomad allocation created")
return eval, meta, nil
case "failed", "canceled", "blocked":
s.Update("Nomad failed to schedule the job")
s.Status(terminal.StatusError)
return nil, nil, fmt.Errorf("Nomad evaluation did not transition to 'complete'")
default:
return nil, nil, fmt.Errorf("receieved unknown eval status from Nomad: %q", eval.Status)
}
}
}

@izaaklauer izaaklauer added new jira Will add an Issue to Jira labels Feb 27, 2023
@requizm
Copy link

requizm commented Jan 6, 2024

Same issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
core/install jira Will add an Issue to Jira plugin/nomad
Projects
None yet
Development

No branches or pull requests

3 participants