diff --git a/docs/content/rest-api.md b/docs/content/rest-api.md index d37c395af3..e417af5d77 100644 --- a/docs/content/rest-api.md +++ b/docs/content/rest-api.md @@ -1513,6 +1513,330 @@ The following table summarizes the behavior for partial evaluation results. > The partially evaluated queries are represented as strings in the table above. The actual API response contains the JSON AST representation. + +## Health API + +The `/health` API endpoint executes a simple built-in policy query to verify +that the server is operational. Optionally it can account for bundle activation as well +(useful for "ready" checks at startup). + +#### Query Parameters +`bundles` - Boolean parameter to account for bundle activation status in response. This includes + any discovery bundles or bundles defined in the loaded discovery configuration. +`plugins` - Boolean parameter to account for plugin status in response. +`exclude-plugin` - String parameter to exclude a plugin from status checks. Can be added multiple + times. Does nothing if `plugins` is not true. This parameter is useful for special use cases + where a plugin depends on the server being fully initialized before it can fully intialize + itself. + +#### Status Codes +- **200** - OPA service is healthy. If the `bundles` option is specified then all configured bundles have + been activated. If the `plugins` option is specified then all plugins are in an OK state. +- **500** - OPA service is not healthy. If the `bundles` option is specified this can mean any of the configured + bundles have not yet been activated. If the `plugins` option is specified then at least one + plugin is in a non-OK state. + +> *Note*: The bundle activation check is only for initial bundle activation. Subsequent downloads + will not affect the health check. The [Status](../management-status) + API should be used for more fine-grained bundle status monitoring. + +#### Example Request +```http +GET /health HTTP/1.1 +``` + +#### Example Request (bundle activation) +```http +GET /health?bundles HTTP/1.1 +``` + +#### Example Request (plugin status) +```http +GET /health?plugins HTTP/1.1 +``` + +#### Example Request (plugin status with exclude) +```http +GET /health?plugins&exclude-plugin=decision-logs&exclude-plugin=status HTTP/1.1 +``` + +#### Healthy Response +```http +HTTP/1.1 200 OK +Content-Type: application/json +``` +```json +{} +``` + +#### Unhealthy Response +```http +HTTP/1.1 500 Internal Server Error +Content-Type: application/json +``` +```json +{ + "error": "not all plugins in OK state" +} +``` + +Other error messages include: + +- `"unable to perform evaluation"` +- `"not all configured bundles have been activated"` + +### Custom Health Checks + +The Health API includes support for "all or nothing" checks that verify +configured bundles have activated and plugins are operational. In some cases, +health checks may need to perform fine-grained checks on plugin state or other +internal components. To support these cases, use the policy-based Health API. + +By convention, the `/health/live` and `/health/ready` API endpoints allow you to +use Rego to evaluate the current state of the server and its plugins to +determine "liveness" (when OPA is capable of receiving traffic) and "readiness" +(when OPA is ready to receive traffic). Policy for the `live` and `ready` rules +is defined under package `system.health`. + +> The "liveness" and "readiness" check convention comes from +> [Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) +> but they are just conventions. You can implement your own check endpoints +> under the `system.health` package as needed. Any rules implemented inside of +> `system.health` will be exposed at `/health/`. + +#### Policy Examples + +Here is a basic health policy for liveness and readiness. In this example, OPA is live once it is +able to process the `live` rule. OPA is ready once all plugins have entered the OK state at least once. + +```rego +package system.health + +// opa is live if it can process this rule +default live = true + +// by default, opa is not ready +default ready = false + +// opa is ready once all plugins have reported OK at least once +ready { + input.plugins_ready +} +``` + +Note that once `input.plugins_ready` is true, it stays true. If you want to fail the ready check when +specific a plugin leaves the OK state, try this: + +```rego +package system.health + +default live = true + +default ready = false + +// opa is ready once all plugins have reported OK at least once AND +// the bundle plugin is currently in an OK state +ready { + input.plugins_ready + input.plugin_state.bundle == "OK" +} +``` + +See the following section for all the inputs available to use in health policy. + +#### Policy Inputs + +- `input.plugins_ready`: Will be false until all registered plugins have started +and are reporting an `OK` state, at which point it will be true. Once true, it will stay true +until the process ends. +- `input.plugin_state.`: Shows the current state of a plugin, where `` +is replaced with the name of the plugin, e.g. `bundle`, `status`. + +#### Status Codes + +- **200** - OPA service is healthy. +- **500** - OPA service is not healthy because policy has not evaluated to true, or is missing. + +#### Example Requests + +```http +GET /health/ready HTTP/1.1 +``` + +```http +GET /health/live HTTP/1.1 +``` + +#### Healthy Response + +```http +HTTP/1.1 200 OK +Content-Type: application/json +``` + +```json +{} +``` + +#### Unhealthy Response + +```http +HTTP/1.1 500 Internal Server Error +Content-Type: application/json +``` + +```json +{ + "error": "health policy was not true at data.system.health." +} +``` + +Other error messages include: + +- `"health policy was undefined at data.system.health."` + + +## Config API + +The `/config` API endpoint returns OPA's active configuration. When the discovery feature is enabled, this API can be +used to fetch the discovered configuration in the last evaluated discovery bundle. The `credentials` field in the +[Services](../configuration#services) configuration and the `private_key` and `key` fields in the [Keys](../configuration#keys) +configuration will be omitted from the API response. + +### Get Config + +``` +GET /v1/config HTTP/1.1 +``` + +#### Query Parameters + +- **pretty** - If parameter is `true`, response will formatted for humans. + +#### Status Codes + +- **200** - no error +- **500** - server error + +#### Example Request +```http +GET /v1/config HTTP/1.1 +``` + +#### Example Response +```http +HTTP/1.1 200 OK +Content-Type: application/json +``` +```json +{ + "result": { + "services": { + "acmecorp": { + "url": "https://example.com/control-plane-api/v1" + } + }, + "labels": { + "id": "test-id", + "version": "0.27.0" + }, + "keys": { + "global_key": { + "scope": "read" + } + }, + "decision_logs": { + "service": "acmecorp" + }, + "status": { + "service": "acmecorp" + }, + "bundles": { + "authz": { + "service": "acmecorp" + } + }, + "default_authorization_decision": "/system/authz/allow", + "default_decision": "/system/main" + } +} +``` + +## Status API + +The `/status` endpoint exposes a pull-based API for accessing OPA +[Status](../management-status) information. Normally this information is pushed +by OPA to a remote service via HTTP, console, or custom plugins. However, in +some cases, callers may wish to poll OPA and fetch the information. + +### Get Status + +``` +GET /v1/status HTTP/1.1 +``` + +#### Query Parameters + +- **pretty** - If parameter is `true`, response will formatted for humans. + +#### Status Codes + +- **200** - no error +- **500** - server error + +#### Example Request +```http +GET /v1/status HTTP/1.1 +``` + +#### Example Response +```http +HTTP/1.1 200 OK +Content-Type: application/json +``` +```json +{ + "result": { + "labels": { + "id": "7da62ac6-42e0-4b3c-b6d5-199239ad436e", + "version": "99.9.9-dev" + }, + "bundles": { + "play": { + "name": "play", + "active_revision": "b3BlbnBvbGljeWFnZW50Lm9yZw==", + "last_successful_activation": "2021-12-08T01:36:14.201927Z", + "last_successful_download": "2021-12-08T01:36:14.20038Z", + "last_successful_request": "2021-12-08T01:36:23.131346Z", + "last_request": "2021-12-08T01:36:23.131346Z", + "metrics": { + "timer_bundle_request_ns": 168273779 + } + } + }, + "metrics": { + "prometheus": { +<------------------8<------------------> + } + }, + "plugins": { + "bundle": { + "state": "OK" + }, + "decision_logs": { + "state": "OK" + }, + "discovery": { + "state": "OK" + }, + "status": { + "state": "OK" + } + } + } +} +``` + ## Authentication The API is secured via [HTTPS, Authentication, and Authorization](../security). @@ -1778,251 +2102,3 @@ OPA currently supports the following query provenance information: - **bundles**: A set of key-value pairs describing each bundle activated on the server. Includes the `revision` field which is the _revision_ string included in a .manifest file (if present) within a bundle - -## Health API - -The `/health` API endpoint executes a simple built-in policy query to verify -that the server is operational. Optionally it can account for bundle activation as well -(useful for "ready" checks at startup). - -#### Query Parameters -`bundles` - Boolean parameter to account for bundle activation status in response. This includes - any discovery bundles or bundles defined in the loaded discovery configuration. -`plugins` - Boolean parameter to account for plugin status in response. -`exclude-plugin` - String parameter to exclude a plugin from status checks. Can be added multiple - times. Does nothing if `plugins` is not true. This parameter is useful for special use cases - where a plugin depends on the server being fully initialized before it can fully intialize - itself. - -#### Status Codes -- **200** - OPA service is healthy. If the `bundles` option is specified then all configured bundles have - been activated. If the `plugins` option is specified then all plugins are in an OK state. -- **500** - OPA service is not healthy. If the `bundles` option is specified this can mean any of the configured - bundles have not yet been activated. If the `plugins` option is specified then at least one - plugin is in a non-OK state. - -> *Note*: The bundle activation check is only for initial bundle activation. Subsequent downloads - will not affect the health check. The [Status](../management-status) - API should be used for more fine-grained bundle status monitoring. - -#### Example Request -```http -GET /health HTTP/1.1 -``` - -#### Example Request (bundle activation) -```http -GET /health?bundles HTTP/1.1 -``` - -#### Example Request (plugin status) -```http -GET /health?plugins HTTP/1.1 -``` - -#### Example Request (plugin status with exclude) -```http -GET /health?plugins&exclude-plugin=decision-logs&exclude-plugin=status HTTP/1.1 -``` - -#### Healthy Response -```http -HTTP/1.1 200 OK -Content-Type: application/json -``` -```json -{} -``` - -#### Unhealthy Response -```http -HTTP/1.1 500 Internal Server Error -Content-Type: application/json -``` -```json -{ - "error": "not all plugins in OK state" -} -``` - -Other error messages include: - -- `"unable to perform evaluation"` -- `"not all configured bundles have been activated"` - -### Custom Health Checks - -The Health API includes support for "all or nothing" checks that verify -configured bundles have activated and plugins are operational. In some cases, -health checks may need to perform fine-grained checks on plugin state or other -internal components. To support these cases, use the policy-based Health API. - -By convention, the `/health/live` and `/health/ready` API endpoints allow you to -use Rego to evaluate the current state of the server and its plugins to -determine "liveness" (when OPA is capable of receiving traffic) and "readiness" -(when OPA is ready to receive traffic). Policy for the `live` and `ready` rules -is defined under package `system.health`. - -> The "liveness" and "readiness" check convention comes from -> [Kubernetes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) -> but they are just conventions. You can implement your own check endpoints -> under the `system.health` package as needed. Any rules implemented inside of -> `system.health` will be exposed at `/health/`. - -#### Policy Examples - -Here is a basic health policy for liveness and readiness. In this example, OPA is live once it is -able to process the `live` rule. OPA is ready once all plugins have entered the OK state at least once. - -```rego -package system.health - -// opa is live if it can process this rule -default live = true - -// by default, opa is not ready -default ready = false - -// opa is ready once all plugins have reported OK at least once -ready { - input.plugins_ready -} -``` - -Note that once `input.plugins_ready` is true, it stays true. If you want to fail the ready check when -specific a plugin leaves the OK state, try this: - -```rego -package system.health - -default live = true - -default ready = false - -// opa is ready once all plugins have reported OK at least once AND -// the bundle plugin is currently in an OK state -ready { - input.plugins_ready - input.plugin_state.bundle == "OK" -} -``` - -See the following section for all the inputs available to use in health policy. - -#### Policy Inputs - -- `input.plugins_ready`: Will be false until all registered plugins have started -and are reporting an `OK` state, at which point it will be true. Once true, it will stay true -until the process ends. -- `input.plugin_state.`: Shows the current state of a plugin, where `` -is replaced with the name of the plugin, e.g. `bundle`, `status`. - -#### Status Codes - -- **200** - OPA service is healthy. -- **500** - OPA service is not healthy because policy has not evaluated to true, or is missing. - -#### Example Requests - -```http -GET /health/ready HTTP/1.1 -``` - -```http -GET /health/live HTTP/1.1 -``` - -#### Healthy Response - -```http -HTTP/1.1 200 OK -Content-Type: application/json -``` - -```json -{} -``` - -#### Unhealthy Response - -```http -HTTP/1.1 500 Internal Server Error -Content-Type: application/json -``` - -```json -{ - "error": "health policy was not true at data.system.health." -} -``` - -Other error messages include: - -- `"health policy was undefined at data.system.health."` - - -## Config API - -The `/config` API endpoint returns OPA's active configuration. When the discovery feature is enabled, this API can be -used to fetch the discovered configuration in the last evaluated discovery bundle. The `credentials` field in the -[Services](../configuration#services) configuration and the `private_key` and `key` fields in the [Keys](../configuration#keys) -configuration will be omitted from the API response. - -### Get Config - -``` -GET /v1/config HTTP/1.1 -``` - -#### Query Parameters - -- **pretty** - If parameter is `true`, response will formatted for humans. - -#### Status Codes - -- **200** - no error -- **500** - server error - -#### Example Request -```http -GET /v1/config HTTP/1.1 -``` - -#### Example Response -```http -HTTP/1.1 200 OK -Content-Type: application/json -``` -```json -{ - "result": { - "services": { - "acmecorp": { - "url": "https://example.com/control-plane-api/v1" - } - }, - "labels": { - "id": "test-id", - "version": "0.27.0" - }, - "keys": { - "global_key": { - "scope": "read" - } - }, - "decision_logs": { - "service": "acmecorp" - }, - "status": { - "service": "acmecorp" - }, - "bundles": { - "authz": { - "service": "acmecorp" - } - }, - "default_authorization_decision": "/system/authz/allow", - "default_decision": "/system/main" - } -} -``` diff --git a/plugins/status/plugin.go b/plugins/status/plugin.go index f6c15c43b8..45c53cc7f0 100644 --- a/plugins/status/plugin.go +++ b/plugins/status/plugin.go @@ -41,25 +41,22 @@ type UpdateRequestV1 struct { // Plugin implements status reporting. Updates can be triggered by the caller. type Plugin struct { - manager *plugins.Manager - config Config - bundleCh chan bundle.Status // Deprecated: Use bulk bundle status updates instead - lastBundleStatus *bundle.Status // Deprecated: Use bulk bundle status updates instead - + manager *plugins.Manager + config Config + bundleCh chan bundle.Status // Deprecated: Use bulk bundle status updates instead + lastBundleStatus *bundle.Status // Deprecated: Use bulk bundle status updates instead bulkBundleCh chan map[string]*bundle.Status lastBundleStatuses map[string]*bundle.Status - - discoCh chan bundle.Status - lastDiscoStatus *bundle.Status - + discoCh chan bundle.Status + lastDiscoStatus *bundle.Status pluginStatusCh chan map[string]*plugins.Status lastPluginStatuses map[string]*plugins.Status - - stop chan chan struct{} - reconfig chan interface{} - metrics metrics.Metrics - logger logging.Logger - trigger chan trigger + queryCh chan chan *UpdateRequestV1 + stop chan chan struct{} + reconfig chan interface{} + metrics metrics.Metrics + logger logging.Logger + trigger chan trigger } // Config contains configuration for the plugin. @@ -197,6 +194,7 @@ func New(parsedConfig *Config, manager *plugins.Manager) *Plugin { stop: make(chan chan struct{}), reconfig: make(chan interface{}), pluginStatusCh: make(chan map[string]*plugins.Status), + queryCh: make(chan chan *UpdateRequestV1), logger: manager.Logger().WithFields(map[string]interface{}{"plugin": Name}), trigger: make(chan trigger), } @@ -276,6 +274,14 @@ func (p *Plugin) Reconfigure(_ context.Context, config interface{}) { p.reconfig <- config } +// Snapshot returns the current status. +func (p *Plugin) Snapshot() *UpdateRequestV1 { + ch := make(chan *UpdateRequestV1) + p.queryCh <- ch + s := <-ch + return s +} + // Trigger can be used to control when the plugin attempts to upload //status in manual triggering mode. func (p *Plugin) Trigger(ctx context.Context) error { @@ -339,6 +345,8 @@ func (p *Plugin) loop() { } case newConfig := <-p.reconfig: p.reconfigure(newConfig) + case respCh := <-p.queryCh: + respCh <- p.snapshot() case update := <-p.trigger: err := p.oneShot(update.ctx) if err != nil { @@ -360,17 +368,7 @@ func (p *Plugin) loop() { func (p *Plugin) oneShot(ctx context.Context) error { - req := &UpdateRequestV1{ - Labels: p.manager.Labels(), - Discovery: p.lastDiscoStatus, - Bundle: p.lastBundleStatus, - Bundles: p.lastBundleStatuses, - Plugins: p.lastPluginStatuses, - } - - if p.metrics != nil { - req.Metrics = map[string]interface{}{p.metrics.Info().Name: p.metrics.All()} - } + req := p.snapshot() if p.config.ConsoleLogs { err := p.logUpdate(req) @@ -424,6 +422,23 @@ func (p *Plugin) reconfigure(config interface{}) { p.config = *newConfig } +func (p *Plugin) snapshot() *UpdateRequestV1 { + + s := &UpdateRequestV1{ + Labels: p.manager.Labels(), + Discovery: p.lastDiscoStatus, + Bundle: p.lastBundleStatus, + Bundles: p.lastBundleStatuses, + Plugins: p.lastPluginStatuses, + } + + if p.metrics != nil { + s.Metrics = map[string]interface{}{p.metrics.Info().Name: p.metrics.All()} + } + + return s +} + func (p *Plugin) logUpdate(update *UpdateRequestV1) error { eventBuf, err := json.Marshal(&update) if err != nil { diff --git a/server/server.go b/server/server.go index f46f7e7f92..244d6b01a1 100644 --- a/server/server.go +++ b/server/server.go @@ -35,6 +35,7 @@ import ( "github.com/open-policy-agent/opa/metrics" "github.com/open-policy-agent/opa/plugins" bundlePlugin "github.com/open-policy-agent/opa/plugins/bundle" + "github.com/open-policy-agent/opa/plugins/status" "github.com/open-policy-agent/opa/rego" "github.com/open-policy-agent/opa/server/authorizer" "github.com/open-policy-agent/opa/server/identifier" @@ -81,6 +82,7 @@ const ( PromHandlerV1Policies = "v1/policies" PromHandlerV1Compile = "v1/compile" PromHandlerV1Config = "v1/config" + PromHandlerV1Status = "v1/status" PromHandlerIndex = "index" PromHandlerCatch = "catchall" PromHandlerHealth = "health" @@ -660,6 +662,7 @@ func (s *Server) initRouters() { s.registerHandler(mainRouter, 1, "/query", http.MethodPost, s.instrumentHandler(s.v1QueryPost, PromHandlerV1Query)) s.registerHandler(mainRouter, 1, "/compile", http.MethodPost, s.instrumentHandler(s.v1CompilePost, PromHandlerV1Compile)) s.registerHandler(mainRouter, 1, "/config", http.MethodGet, s.instrumentHandler(s.v1ConfigGet, PromHandlerV1Config)) + s.registerHandler(mainRouter, 1, "/status", http.MethodGet, s.instrumentHandler(s.v1StatusGet, PromHandlerV1Status)) mainRouter.Handle("/", s.instrumentHandler(s.unversionedPost, PromHandlerIndex)).Methods(http.MethodPost) mainRouter.Handle("/", s.instrumentHandler(s.indexGet, PromHandlerIndex)).Methods(http.MethodGet) @@ -2121,6 +2124,22 @@ func (s *Server) v1ConfigGet(w http.ResponseWriter, r *http.Request) { writer.JSON(w, http.StatusOK, resp, pretty) } +func (s *Server) v1StatusGet(w http.ResponseWriter, r *http.Request) { + pretty := getBoolParam(r.URL, types.ParamPrettyV1, true) + + p := status.Lookup(s.manager) + if p == nil { + writer.ErrorString(w, http.StatusInternalServerError, types.CodeInternal, errors.New("status plugin not enabled")) + return + } + + var st interface{} = p.Snapshot() + var resp types.StatusResponseV1 + resp.Result = &st + + writer.JSON(w, http.StatusOK, resp, pretty) +} + func (s *Server) checkPolicyIDScope(ctx context.Context, txn storage.Transaction, id string) error { bs, err := s.store.GetPolicy(ctx, txn, id) diff --git a/server/server_test.go b/server/server_test.go index fefba99bb8..56994b40b2 100644 --- a/server/server_test.go +++ b/server/server_test.go @@ -28,6 +28,7 @@ import ( "github.com/open-policy-agent/opa/metrics" "github.com/open-policy-agent/opa/plugins" pluginBundle "github.com/open-policy-agent/opa/plugins/bundle" + pluginStatus "github.com/open-policy-agent/opa/plugins/status" "github.com/open-policy-agent/opa/server/authorizer" "github.com/open-policy-agent/opa/server/identifier" "github.com/open-policy-agent/opa/server/types" @@ -2741,6 +2742,82 @@ func TestPoliciesUrlEncoded(t *testing.T) { } } +func TestStatusV1(t *testing.T) { + + f := newFixture(t) + + // Expect HTTP 500 before status plugin is registered + req := newReqV1(http.MethodGet, "/status", "") + f.server.Handler.ServeHTTP(f.recorder, req) + + if f.recorder.Result().StatusCode != http.StatusInternalServerError { + t.Fatal("expected internal error") + } + + // Expect HTTP 200 after status plus is registered + manual := plugins.TriggerManual + bs := pluginStatus.New(&pluginStatus.Config{Trigger: &manual}, f.server.manager) + err := bs.Start(context.Background()) + if err != nil { + t.Fatal(err) + } + + f.server.manager.Register(pluginStatus.Name, bs) + + req = newReqV1(http.MethodGet, "/status", "") + f.reset() + f.server.Handler.ServeHTTP(f.recorder, req) + if f.recorder.Result().StatusCode != http.StatusOK { + t.Fatal("expected ok") + } + + var resp1 struct { + Result struct { + Plugins struct { + Status struct { + State string + } + } + } + } + if err := util.NewJSONDecoder(f.recorder.Body).Decode(&resp1); err != nil { + t.Fatal(err) + } else if resp1.Result.Plugins.Status.State != "OK" { + t.Fatal("expected plugin state for status to be 'OK' but got:", resp1) + } + + // Expect HTTP 200 and updated status after bundle update occurs + bs.BulkUpdateBundleStatus(map[string]*pluginBundle.Status{ + "test": { + Name: "test", + }, + }) + + req = newReqV1(http.MethodGet, "/status", "") + f.reset() + f.server.Handler.ServeHTTP(f.recorder, req) + + if f.recorder.Result().StatusCode != http.StatusOK { + t.Fatal("expected ok") + } + + var resp2 struct { + Result struct { + Bundles struct { + Test struct { + Name string + } + } + } + } + + if err := util.NewJSONDecoder(f.recorder.Body).Decode(&resp2); err != nil { + t.Fatal(err) + } else if resp2.Result.Bundles.Test.Name != "test" { + t.Fatal("expected bundle to exist in status response but got:", resp2) + } +} + func TestQueryPostBasic(t *testing.T) { f := newFixture(t) f.server, _ = New(). diff --git a/server/types/types.go b/server/types/types.go index 6c85e8e7e3..c5d3d74ea3 100644 --- a/server/types/types.go +++ b/server/types/types.go @@ -379,6 +379,11 @@ type ConfigResponseV1 struct { Result *interface{} `json:"result,omitempty"` } +// StatusResponseV1 models the response message for Status API (pull) operations. +type StatusResponseV1 struct { + Result *interface{} `json:"result,omitempty"` +} + // HealthResponseV1 models the response message for Health API operations. type HealthResponseV1 struct { Error string `json:"error,omitempty"`