Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea to improve performance after missing a cache during scrape processing #13947

Open
maokitty opened this issue Apr 18, 2024 · 0 comments
Open

Comments

@maokitty
Copy link

maokitty commented Apr 18, 2024

Proposal

Problem

current scrape logic:

			p.Metric(&lset)
			hash = lset.Hash()

			// Hash label set as it is seen local to the target. Then add target labels
			// and relabeling and store the final label set.
			lset = sl.sampleMutator(lset)

			// The label set may be set to empty to indicate dropping.
			if lset.IsEmpty() {
				sl.cache.addDropped(met)
				continue
			}

			if !lset.Has(labels.MetricName) {
				err = errNameLabelMandatory
				break loop
			}
			if !lset.IsValid() {
				err = fmt.Errorf("invalid metric name or label names: %s", lset.String())
				break loop
			}

			// If any label limits is exceeded the scrape should fail.
			if err = verifyLabelLimits(lset, sl.labelLimits); err != nil {
				sl.metrics.targetScrapePoolExceededLabelLimits.Inc()
				break loop
			}

			// Append metadata for new series if they were present.
			updateMetadata(lset, true)
  1. Create a duplicate of labels.builder.
    p. Metric($lset) use ScratchBuilder to build labels, and in sampleMutator it will transform structure labels.Labels back to labels.Builder, which will cause duplicate sort and memory allocation for labels, same operation happened in relable.process.

  2. Traverse lset multiple times.
    The structure of lset is []Label, traversing lset is expensive, operations like get metricName, lset valid and verifyLabel can be reused and merged into one traversal.

  3. Memory allocation is not precise enough
    Some operations can specify make capacity more precisely to reduce performance consumption during grow capacity. such as mutatesample and scratchbuilder

Advice

  1. use labels.builder all the time until the cache miss logic is over.
  2. the data structure of label.builder can use more efficient structures like map?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant