Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EEM][POC] The POC for creating entity-centric indices using entity definitions #183205

Merged
merged 17 commits into from May 17, 2024

Conversation

simianhacker
Copy link
Member

@simianhacker simianhacker commented May 10, 2024

Summary

This is a "proof of concept" for generating entity-centric indices for the OAM. This exposes an API (/api/entities) for creating "asset definitions" (EntityDefinition) that manages a transform and ingest pipeline to produce documents into an index which could be used to create a search experience or lookups for different services.

Features

  • Data schema agnostic, works with known schemas OR custom logs
  • Supports defining multiple identityFields along with an identityTemplate for formatting the asset.id
  • Supports optional identityFields using { "field": "path-to-field", "optional": true } definition instead of a string.
  • Supports defining key metrics with equations which are compatible with the SLO product
  • Supports adding metadata fields which will include multiple values.
  • Supports metadata fields can be re-mapped to a new destination path using { "source": "path-to-source-field", "limit": 1000, "destination": "path-to-destination-in-output" } definition instead of a string
  • Supports adding staticFields which can also use template variables
  • Support fine grain control over the frequency and sync settings for the underlying transform
  • Installs the index template components and index template settings for the destination index
  • Allow the user to configure the index patterns and timestamp field along with the lookback
  • The documents for each definition will be stored in their own index (.entities-observability.summary-v1.{defintion.id})

Notes

  • We are currently considering adding a historical index which will track changes to the assets over time. If we choose to do this, the summary index would remain the same but we'd add a second transform with a group_by on the definition.timestampField and break the indices into monthly indexes (configurable in the settings).
  • We are looking into ways to add firstSeenTimestamp, this is a difficult due to scaling issue. Essentially, we would need to find the minimum timestamp for each entity which could be extremely costly on a large datasets.
  • There is nothing stopping you from creating an asset definition that uses the .entities-observability.summary-v1.* index pattern to create summaries of summaries... it can be very "meta".

API

  • POST /api/entities/definition - Creates a new asset definition and starts the indexing. See examples below.
  • DELETE /api/entities/definition/{id} - Deletes the asset definition along with cleaning up the transform, ingest pipeline, and deletes the destination index.
  • POST /api/entities/definition/{id}/_reset - Resets the transform, ingest pipeline, and destination index. This is useful for upgrading asset definitions to new features.

Example Definitions and Output

Here is a definition for creating services for each of the custom log sources in the fake_stack dataset from x-pack/packages/data-forge.

POST kbn:/api/entities/definition
{
  "id": "admin-console-logs-service",
  "name": "Services for Admin Console",
  "type": "service",
  "indexPatterns": ["kbn-data-forge-fake_stack.*"],
  "timestampField": "@timestamp",
  "lookback": "5m",
  "identityFields": ["log.logger"],
  "identityTemplate": "{{log.logger}}",
  "metadata": [
    "tags",
    "host.name"
  ],
  "metrics": [
    {
      "name": "logRate",
      "equation": "A / 5", 
      "metrics": [
        {
          "name": "A",
          "aggregation": "doc_count",
          "filter": "log.level: *"
        }
      ]
    },
    {
      "name": "errorRate",
      "equation": "A / 5", 
      "metrics": [
        {
          "name": "A",
          "aggregation": "doc_count",
          "filter": "log.level: \"ERROR\""
        }
      ]
    }
  ]
}

Which produces:

{
  "host": {
    "name": [
      "admin-console.prod.020",
      "admin-console.prod.010",
      "admin-console.prod.011",
      "admin-console.prod.001",
      "admin-console.prod.012",
      "admin-console.prod.002",
      "admin-console.prod.013",
      "admin-console.prod.003",
      "admin-console.prod.014",
      "admin-console.prod.004",
      "admin-console.prod.015",
      "admin-console.prod.016",
      "admin-console.prod.005",
      "admin-console.prod.017",
      "admin-console.prod.006",
      "admin-console.prod.018",
      "admin-console.prod.007",
      "admin-console.prod.019",
      "admin-console.prod.008",
      "admin-console.prod.009"
    ]
  },
  "entity": {
    "latestTimestamp": "2024-05-10T22:04:51.481Z",
    "metric": {
      "logRate": 37.4,
      "errorRate": 1
    },
    "identity": {
      "log": {
        "logger": "admin-console"
      }
    },
    "id": "admin-console",
    "indexPatterns": [
      "kbn-data-forge-fake_stack.*"
    ],
    "definitionId": "admin-console-logs-service"
  },
  "event": {
    "ingested": "2024-05-10T22:05:51.955691Z"
  },
  "tags": [
    "infra:admin-console"
  ]
}

Here is an example of a definition for APM Services:

POST kbn:/api/entities/definition
{
  "id": "apm-services",
  "name": "Services for APM",
  "type": "service", 
  "indexPatterns": ["logs-*", "metrics-*"],
  "timestampField": "@timestamp",
  "lookback": "5m",
  "identityFields": ["service.name", "service.environment"],
  "identityTemplate": "{{service.name}}:{{service.environment}}",
  "metadata": [
    "tags",
    "host.name"
  ],
  "metrics": [
    {
      "name": "latency",
      "equation": "A",
      "metrics": [
        {
          "name": "A",
          "aggregation": "avg",
          "field": "transaction.duration.histogram"
        }
      ]
    },
    {
      "name": "throughput",
      "equation": "A / 5",
      "metrics": [
        {
          "name": "A",
          "aggregation": "doc_count"
        }
      ]
    },
    {
      "name": "failedTransRate",
      "equation": "A / B",
      "metrics": [
        {
          "name": "A",
          "aggregation": "doc_count",
          "filter": "event.outcome: \"failure\""
        },
        {
          "name": "B",
          "aggregation": "doc_count",
          "filter": "event.outcome: *"
        }
      ]
    }
  ]
}

Which produces:

{
  "host": {
    "name": [
      "simianhacker's-macbook-pro"
    ]
  },
  "entity": {
    "latestTimestamp": "2024-05-10T21:38:22.513Z",
    "metric": {
      "latency": 615276.8812785388,
      "throughput": 50.6,
      "failedTransRate": 0.0091324200913242
    },
    "identity": {
      "service": {
        "environment": "development",
        "name": "admin-console"
      }
    },
    "id": "admin-console:development",
    "indexPatterns": [
      "logs-*",
      "metrics-*"
    ],
    "definitionId": "apm-services"
  },
  "event": {
    "ingested": "2024-05-10T21:39:33.636225Z"
  },
  "tags": [
    "_geoip_database_unavailable_GeoLite2-City.mmdb"
  ]
}

Getting Started

The easiest way to get started is to use thekbn-data-forge config below. Save this YAML to ~/Desktop/fake_stack.yaml then run node x-pack/scripts/data_forge.js --config ~/Desktop/fake_stack.yaml. Then create a definition using the first example above.

---
elasticsearch:
  installKibanaUser: false

kibana:
  installAssets: true
  host: "http://localhost:5601/kibana"

indexing:
  dataset: "fake_stack"
  eventsPerCycle: 50
  reduceWeekendTrafficBy: 0.5

schedule:
  # Start with good events
  - template: "good"
    start: "now-1d"
    end: "now-20m"
    eventsPerCycle: 50
    randomness: 0.8
  - template: "bad"
    start: "now-20m"
    end: "now-10m"
    eventsPerCycle: 50
    randomness: 0.8
  - template: "good"
    start: "now-10m"
    end: false
    eventsPerCycle: 50
    randomness: 0.8

@simianhacker simianhacker requested a review from a team as a code owner May 10, 2024 23:03
@botelastic botelastic bot added the ci:project-deploy-observability Create an Observability project label May 10, 2024
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

x-pack/packages/kbn-oam-schema/src/schema/asset.ts Outdated Show resolved Hide resolved
x-pack/packages/kbn-oam-schema/src/schema/asset.ts Outdated Show resolved Hide resolved
id: z.string().regex(/^[\w-]+$/),
name: z.string(),
description: z.optional(z.string()),
type: assetTypeSchema,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implies that only definitions using known types are alllowed?

we should discuss this again with chris d to ensure it's extensible wrt user-defined entity definitions.

@simianhacker simianhacker marked this pull request as draft May 13, 2024 20:45
- making indexPattern plural
- removing categories and assst.category
- fixing typos
- adding clean up when creation fails
- changing path from `/api/oam` to `/api/oam/definition`
- removing unused `preview_transform.ts`
- updating fixtures and tests
- changing OAMNotFound to OAMDefinitionNotFound
Copy link

@tommyers-elastic tommyers-elastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome chris - thanks

@tommyers-elastic tommyers-elastic changed the title [OAM][POC] The POC for creating entity-centric indices using asset definitions [OAM][POC] The POC for creating entity-centric indices using entity definitions May 15, 2024
@simianhacker simianhacker marked this pull request as ready for review May 16, 2024 21:17
@simianhacker simianhacker added release_note:feature Makes this part of the condensed release notes v8.15.0 Team:obs-knowledge Observability Experience Knowledge team labels May 16, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-knowledge-team (Team:obs-knowledge)

@kibana-ci
Copy link
Collaborator

kibana-ci commented May 17, 2024

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/entities-schema - 19 +19

Canvas Sharable Runtime

The Canvas "shareable runtime" is an bundle produced to enable running Canvas workpads outside of Kibana. This bundle is included in third-party webpages that embed canvas and therefor should be as slim as possible.

id before after diff
module count - 5405 +5405
total size - 8.8MB +8.8MB
Unknown metric groups

API count

id before after diff
@kbn/entities-schema - 19 +19

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@@ -0,0 +1,3 @@
# @kbn/entities-schema

The entities schema for the asset model for Observability

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can leave this 'asset' here just for historical interest 🪦

...acc,
[`entity.identity.${field}`]: { terms: { field } },
[`entity.identity.${id.field}`]: {
terms: { field: id.field, missing_bucket: id.optional },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that was simple!

@tommyers-elastic tommyers-elastic changed the title [OAM][POC] The POC for creating entity-centric indices using entity definitions [EEM][POC] The POC for creating entity-centric indices using entity definitions May 17, 2024
@simianhacker simianhacker merged commit 7ae07f8 into elastic:main May 17, 2024
37 checks passed
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:project-deploy-observability Create an Observability project release_note:feature Makes this part of the condensed release notes Team:obs-knowledge Observability Experience Knowledge team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants