Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure that only registered cluster workers are asked to report metrics #182

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

orestis
Copy link

@orestis orestis commented Mar 26, 2018

This is a PR related to #181, as start of discussion.

Tests don't pass yet as I used spaces instead of tabs :)

Copy link
Collaborator

@zbjornson zbjornson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering issue (new AggregatorRegistry() vs. cluster.fork()) is a breaking change in this PR's current state. It might be sufficient for the worker to retry sending the init message until the master acknowledges, for up to some amount of time. Sort of adds a lot of code...

Didn't comment on the cleanup stuff (stray console.log, linter stuff).

lib/cluster.js Outdated
that.aliveWorkers.add(workerId);
console.log(that.aliveWorkers);
}
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding another cluster.on("message") listener, can you do this in addListeners?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that I might have to move the aliveListeners to be a module-level variable. Would that be ok?

lib/cluster.js Outdated
worker.send(message)
}
}
that.aliveWorkers = new Set([...that.aliveWorkers].filter(x => !failedWorkers.has(x)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just do this.aliveWorkers.remove(...) in the loop, instead of making a new Set instance each time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to check the semantics of mutating the Set while iterating, but yes will change.


Options are:
coordinated If false (default), request metrics from all cluster workers. If true, request metrics only from workers that have required prom-client.
* @param {object?} options object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the only reason this needs to be an option because setting it to true makes the order of forking vs. new AggregatorRegistry() matter? I'd rather go for an implementation that isn't sensitive to that, e.g. the worker repeatedly attempts to register with the master until the master acks the registration.

Having a heterogeneous pool of workers (not all of them setting up prom-client) is unusual, but I think the behavior achieved when this is true is what should happen by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the only reason to make it an option is that it would break backwards compatibility.

Given that a metrics client should be as unobtrusive as possible, I would be wary of adding a prolonged discovery phase. I would certainly prefer in my consuming codebase to keep things simple and accept that ordering matters.

In the case of the homogenous cluster, the previous implementation cleanly sidesteps a lot of the issues with “garbage collecting” the workers and nobody else had complained so far :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we will be breaking backwards compatibility pretty hard soon ish (see #177, #178 and #180), so don't let semver stop you from writing the code you'd like to write :D

help: 'Number of connected cluster workers reporting to prometheus',
registers: [registry]
});
g.set(request.workerCount);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to move this new metric to a separate PR. (If @siimon and @SimenB are okay with it, it's a simple change that could land before this PR is ironed out.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, however this is very specific to the coordinated option - perhaps it should be exposed only when coordinated is true. Perhaps a similar metric could be added to the default metrics collected that counts all the cluster workers?

@zbjornson
Copy link
Collaborator

(Sorry for the delay, busy week. Will try to send more feedback later today.)

@SimenB
Copy link
Collaborator

SimenB commented Sep 19, 2018

Any news here?

@zbjornson
Copy link
Collaborator

Sorry to drop the ball.

I'm sorta hesitant to add the complexity to support an unusual use case (heterogeneous workers in a cluster), but this approach works.

My preference would be a version that isn't sensitive to the order of new AggregatorRegistry() vs. cluster.fork() in the client code, but since that only matters if you opt-in to coordinated and it's an advanced usage scenario, those advanced users could be expected to understand/deal with that requirement.

PR also still needs some formatting/linting issues addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants