Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/prometheus] Add Target Info API #23244

Closed
wants to merge 19 commits into from

Conversation

Aneurysm9
Copy link
Member

Description: Adds ability to provide confighttp.HTTPServerSettings to the prometheus receiver that will be used to expose a subset of the Prometheus API. At present this only includes the /targets resource that will return information about active and discovered scrape targets, including debugging information typically not available without verbose debug logging.

Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
Signed-off-by: Anthony J Mirabella <a9@aneurysm9.com>
@github-actions github-actions bot added the Stale label Jan 3, 2024
@Aneurysm9 Aneurysm9 removed the Stale label Jan 9, 2024
Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not expose yet another HTTP endpoint/server per component, we should use https://github.com/open-telemetry/opentelemetry-collector/tree/main/extension/zpagesextension

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per author comment, I have to block this since I want my comment to be considered, see #30135 (comment)

@gracewehner
Copy link
Contributor

Thanks @Aneurysm9 for re-opening this!

My suggestion would be to re-use the Prometheus API struct with agent mode set to true, so that we get the benefit of not needing to have as much duplicated code for the API internals from the Prometheus repo and this code drifting from the main Prometheus branch.

I agree we don't want to add any additional API paths that don't apply to the Prometheus Receiver. With the API from the Prometheus repo, all paths with wrap() will return data, whereas all paths with wrapAgent() will return "unavailable with Prometheus agent": https://github.com/prometheus/prometheus/blob/main/web/api/v1/api.go#L362-L407. This way only the paths /targets, /scrape_pools, /status/*, actually return data and do any calculations/lookups.

I tried this below as a rough POC and verified it works. This sets up the API in the same way as the Prometheus web package, which sets up the API and hosts it in addition to hosting the UI. We can do the same, but without adding in any of the UI-related code for serving the react app paths:

   func (r *pReceiver) initPrometheusComponents(ctx context.Context, host component.Host, logger log.Logger) error {

        // All existing code
        ...
        ...

       // Create Options just for easy readability for creating the API object.
       // These settings are more applicable for what we want to expose for configuration for the Prometheus Receiver.
	o := &web.Options{
		ScrapeManager: r.scrapeManager,
		Context:       ctx,
		ListenAddress: ":9090",
		ExternalURL: &url.URL{
			Scheme: "http",
			Host:   "localhost:9090",
			Path:   "",
		},
		RoutePrefix: "/",
		ReadTimeout: time.Minute * readTimeoutMinutes,
		PageTitle:   "Prometheus Receiver",
		Version: &web.PrometheusVersion{
			Version:   version.Version,
			Revision:  version.Revision,
			Branch:    version.Branch,
			BuildUser: version.BuildUser,
			BuildDate: version.BuildDate,
			GoVersion: version.GoVersion,
		},
		Flags:          make(map[string]string),
		MaxConnections: maxConnections,
		IsAgent:        true,
		Gatherer:       prometheus.DefaultGatherer,
	}

       // Creates the API object in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L314-L354
       // Anything not defined by the options above will be nil, such as o.QueryEngine, o.Storage, etc. IsAgent=true, so these being nil is expected by Prometheus.
	factorySPr := func(_ context.Context) api_v1.ScrapePoolsRetriever { return r.scrapeManager }
	factoryTr := func(_ context.Context) api_v1.TargetRetriever { return r.scrapeManager }
	factoryAr := func(_ context.Context) api_v1.AlertmanagerRetriever { return nil }
	FactoryRr := func(_ context.Context) api_v1.RulesRetriever { return nil }
	var app storage.Appendable
	logger = log.NewNopLogger()

	apiV1 := api_v1.NewAPI(o.QueryEngine, o.Storage, app, o.ExemplarStorage, factorySPr, factoryTr, factoryAr,
		func() config.Config {
			return *r.cfg.PrometheusConfig
		},
		o.Flags,
		api_v1.GlobalURLOptions{
			ListenAddress: o.ListenAddress,
			Host:          o.ExternalURL.Host,
			Scheme:        o.ExternalURL.Scheme,
		},
		func(f http.HandlerFunc) http.HandlerFunc {
			return func(w http.ResponseWriter, r *http.Request) {
				f(w, r)
			}
		},
		o.LocalStorage,
		o.TSDBDir,
		o.EnableAdminAPI,
		logger,
		FactoryRr,
		o.RemoteReadSampleLimit,
		o.RemoteReadConcurrencyLimit,
		o.RemoteReadBytesInFrame,
		o.IsAgent,
		o.CORSOrigin,
		func() (api_v1.RuntimeInfo, error) {
			status := api_v1.RuntimeInfo{
				GoroutineCount: runtime.NumGoroutine(),
				GOMAXPROCS:     runtime.GOMAXPROCS(0),
				GOMEMLIMIT:     debug.SetMemoryLimit(-1),
				GOGC:           os.Getenv("GOGC"),
				GODEBUG:        os.Getenv("GODEBUG"),
			}
		
			return status, nil
		},
		nil,
		o.Gatherer,
		o.Registerer,
		nil,
		o.EnableRemoteWriteReceiver,
		o.EnableOTLPWriteReceiver,
	)

	// Create listener and monitor with conntrack in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L564-L579
	level.Info(logger).Log("msg", "Start listening for connections", "address", o.ListenAddress)
	listener, err := net.Listen("tcp", o.ListenAddress)
	if err != nil {
		return err
	}
	listener = netutil.LimitListener(listener, o.MaxConnections)
	listener = conntrack.NewListener(listener,
		conntrack.TrackWithName("http"),
		conntrack.TrackWithTracing())

        // Run the API server in the same way as the Prometheus web package: https://github.com/prometheus/prometheus/blob/6150e1ca0ede508e56414363cc9062ef522db518/web/web.go#L582-L630
	mux := http.NewServeMux()
	router := route.New().WithInstrumentation(setPathWithPrefix(""))
	mux.Handle("/", router)

        // This is the path the web package uses, but the router above with no prefix can also be Registered by apiV1 instead.
	apiPath := "/api"
	if o.RoutePrefix != "/" {
		apiPath = o.RoutePrefix + apiPath
		level.Info(logger).Log("msg", "Router prefix", "prefix", o.RoutePrefix)
	}
	av1 := route.New().
		WithInstrumentation(setPathWithPrefix(apiPath + "/v1"))
	apiV1.Register(av1)
	mux.Handle(apiPath+"/v1/", http.StripPrefix(apiPath+"/v1", av1))

	errlog := stdlog.New(log.NewStdlibAdapter(level.Error(logger)), "", 0)
	spanNameFormatter := otelhttp.WithSpanNameFormatter(func(_ string, r *http.Request) string {
		return fmt.Sprintf("%s %s", r.Method, r.URL.Path)
	})
	httpSrv := &http.Server{
		Handler:     otelhttp.NewHandler(mux, "", spanNameFormatter),
		ErrorLog:    errlog,
		ReadTimeout: o.ReadTimeout,
	}
	webconfig := ""

        // An error channel will be needed for graceful shutdown in the Shutdown() method for the receiver
	go func() {
		toolkit_web.Serve(listener, httpSrv, &toolkit_web.FlagConfig{WebConfigFile: &webconfig}, logger)
	}()
  
       return nil
}

@bogdandrutu bogdandrutu dismissed their stale review January 19, 2024 16:29

Hope we are reasonable and not need to block for a review to be considered

@gracewehner
Copy link
Contributor

Hi @Aneurysm9 any update on this PR? I have confirmed it's possible to host the Prom UI separately on a different port through golang and re-route the API calls to the prom receiver's API port, so this PR will still work well with the out-of-the-box Prom react app

@dashpole
Copy link
Contributor

I would also prefer not to copy as much of the prometheus codebase if we can avoid it.

Copy link
Contributor

github-actions bot commented Feb 7, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 7, 2024
@gracewehner
Copy link
Contributor

Hi @Aneurysm9 friendly ping for this PR. I am happy to help with any changes needed for this PR to go in

@github-actions github-actions bot removed the Stale label Feb 17, 2024
Copy link
Contributor

github-actions bot commented Mar 2, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 2, 2024
@mx-psi mx-psi removed the Stale label Mar 4, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

Copy link
Contributor

github-actions bot commented Apr 3, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Apr 3, 2024
@Aneurysm9 Aneurysm9 removed the Stale label Apr 8, 2024
@Aneurysm9
Copy link
Member Author

@gracewehner I think we're going to run into issues with Prometheus having duplicated some Collector code. I get the following error, even after removing any explicit reference to the Prometheus storage package:

=== FAIL: internal/api  (0.00s)
panic: failed to register "pkg.translator.prometheus.PermissiveLabelSanitization": gate is already registered

goroutine 1 [running]:
go.opentelemetry.io/collector/featuregate.(*Registry).MustRegister(...)
        /home/ec2-user/go/pkg/mod/go.opentelemetry.io/collector/featuregate@v1.4.1-0.20240404121116-4f1a8936d26b/registry.go:114
github.com/prometheus/prometheus/storage/remote/otlptranslator/prometheus.init()
        /home/ec2-user/go/pkg/mod/github.com/prometheus/prometheus@v0.51.1/storage/remote/otlptranslator/prometheus/normalize_label.go:15 +0x390
FAIL    github.com/open-telemetry/opentelemetry-collector-contrib/receiver/prometheusreceiver/internal/api      0.060s

I suspect we're caught in a loop where Prometheus duplicates code from the Collector because we import code from them which causes problems for them updating the code, which prevents us from further importing code that references that duplicated code, causing us to duplicate code from them. Since there's just a single module on the Prometheus side we don't have an option to replace their implementation with our own.

@Aneurysm9
Copy link
Member Author

Based on discussion at the WG last week I have created prometheus/prometheus#13932 to remove the conflicting feature gate registration from the copied translation packages in prometheus/prometheus.

@gracewehner
Copy link
Contributor

Thanks @Aneurysm9 for investigating, I was also seeing that issue. I had also been working on a full PR for the alternative API approach I had mentioned above and having it as an extension. I just made the PR here: #32646. We can discuss the approaches in the meeting tomorrow

Copy link
Contributor

github-actions bot commented May 8, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 8, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants