Skip to content

Commit

Permalink
Add cache to submitchecks (#2159)
Browse files Browse the repository at this point in the history
* Add cache to submitchecks

On submission we check if jobs are schedulable on any known node
 - If they are not, submission is rejected

Often submissions contain many (100-500) jobs with the same scheduling requirements

So now instead of checking if each are schedulable individually we:
 - Hash the job scheduling requirements
 - Cache the result for scheduling a scheduling requirement
 - Use the cache value when it is present

I have also made it so GangIdAnnotation and GangCardinalityAnnotation are no longer configurable and defined as constants

* Add test

* revert unintended change

* go.sum change

* Only perform submission cache for non-gang jobs + remove annotations

* Add stack

* Rename

* go.mod

* Fix hashing for resource.Quantity

* Format hash.go

* Lint

* Update ci.go

---------

Co-authored-by: Albin Severinson <albin@severinson.org>
  • Loading branch information
JamesMurkin and severinson committed Feb 22, 2023
1 parent 3871df5 commit bdfa894
Show file tree
Hide file tree
Showing 14 changed files with 666 additions and 60 deletions.
2 changes: 0 additions & 2 deletions config/armada/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,6 @@ scheduling:
- memory
minTerminationGracePeriod: 1s
maxTerminationGracePeriod: 300s
gangIdAnnotation: armadaproject.io/gangId
gangCardinalityAnnotation: armadaproject.io/gangCardinality
queueManagement:
defaultPriorityFactor: 1000
defaultQueuedJobsLimit: 0 # No Limit
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ require (
github.com/kyleconroy/sqlc v1.16.0
github.com/magefile/mage v1.14.0
github.com/matryer/moq v0.3.0
github.com/mitchellh/hashstructure/v2 v2.0.2
github.com/openconfig/goyang v1.2.0
github.com/pingcap/log v0.0.0-20210906054005-afc726e70354
github.com/prometheus/common v0.37.0
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -998,6 +998,8 @@ github.com/mitchellh/go-testing-interface v1.0.0/go.mod h1:kRemZodwjscx+RGhAo8eI
github.com/mitchellh/gox v0.4.0/go.mod h1:Sd9lOJ0+aimLBi73mGofS1ycjY8lL3uZM3JPS42BGNg=
github.com/mitchellh/gox v1.0.1 h1:x0jD3dcHk9a9xPSDN6YEL4xL6Qz0dvNYm8yZqui5chI=
github.com/mitchellh/gox v1.0.1/go.mod h1:ED6BioOGXMswlXa2zxfh/xdd5QhwYliBFn9V18Ap4z4=
github.com/mitchellh/hashstructure/v2 v2.0.2 h1:vGKWl0YJqUNxE8d+h8f6NJLcCJrgbhC4NcD46KavDd4=
github.com/mitchellh/hashstructure/v2 v2.0.2/go.mod h1:MG3aRVU/N29oo/V/IhBX8GR/zz4kQkprJgF2EVszyDE=
github.com/mitchellh/iochan v1.0.0 h1:C+X3KsSTLFVBr/tK1eYN/vs4rJcvsiLU338UhYPJWeY=
github.com/mitchellh/iochan v1.0.0/go.mod h1:JwYml1nuB7xOzsp52dPpHFffvOCDupsG0QubkSMEySY=
github.com/mitchellh/mapstructure v0.0.0-20160808181253-ca63d7c062ee/go.mod h1:FVVH3fgwuzCH5S8UJGiWEs2h04kUh9fWfEaFds41c1Y=
Expand Down
14 changes: 14 additions & 0 deletions internal/armada/configuration/constants.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
package configuration

// GangIdAnnotation Jobs with equal value for this annotation make up a gang.
// All jobs in a gang are guaranteed to be scheduled onto the same cluster at the same time.
const GangIdAnnotation = "armadaproject.io/gangId"

// GangCardinalityAnnotation All jobs in a gang must specify the total number of jobs in the gang via this annotation.
// The cardinality should be expressed as an integer, e.g., "3".
const GangCardinalityAnnotation = "armadaproject.io/gangCardinality"

var ArmadaManagedAnnotations = []string{
GangIdAnnotation,
GangCardinalityAnnotation,
}
6 changes: 0 additions & 6 deletions internal/armada/configuration/types.go
Original file line number Diff line number Diff line change
Expand Up @@ -170,12 +170,6 @@ type SchedulingConfig struct {
// Should normally not be set greater than single-digit minutes,
// since cancellation and preemption may need to wait for this amount of time.
MaxTerminationGracePeriod time.Duration
// Jobs with equal value for this annotation make up a gang.
// All jobs in a gang are guaranteed to be scheduled onto the same cluster at the same time.
GangIdAnnotation string
// All jobs in a gang must specify the total number of jobs in the gang via this annotation.
// The cardinality should be expressed as an integer, e.g., "3".
GangCardinalityAnnotation string
// If an executor hasn't heartbeated in this time period, it will be considered stale
ExecutorTimeout time.Duration
}
Expand Down
2 changes: 1 addition & 1 deletion internal/armada/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ func Serve(ctx context.Context, config *configuration.ArmadaConfig, healthChecks
PulsarSchedulerEnabled: config.PulsarSchedulerEnabled,
ProbabilityOfUsingPulsarScheduler: config.ProbabilityOfUsingPulsarScheduler,
Rand: util.NewThreadsafeRand(time.Now().UnixNano()),
GangIdAnnotation: config.Scheduling.GangIdAnnotation,
GangIdAnnotation: configuration.GangIdAnnotation,
IgnoreJobSubmitChecks: config.IgnoreJobSubmitChecks,
}
submitServerToRegister := pulsarSubmitServer
Expand Down

0 comments on commit bdfa894

Please sign in to comment.