Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary gRPC rebalancing leads to occasional WARN logs #15821

Closed
kisunji opened this issue Dec 16, 2022 · 2 comments
Closed

Unnecessary gRPC rebalancing leads to occasional WARN logs #15821

kisunji opened this issue Dec 16, 2022 · 2 comments

Comments

@kisunji
Copy link
Contributor

kisunji commented Dec 16, 2022

While investigating #10603 I think I spotted a small bug in our router spinning up an unnecessary grpc rebalancer:

  1. A Router has a single grpcServerTracker
    // grpcServerTracker is used to balance grpc connections across servers,
    // and has callbacks for adding or removing a server.
    grpcServerTracker ServerTracker
  2. A Router will call AddArea with LAN and WAN separately
    if err := s.router.AddArea(types.AreaLAN, s.serfLAN, s.connPool); err != nil {
    s.Shutdown()
    return nil, fmt.Errorf("Failed to add LAN serf route: %w", err)
    }
    go s.lanEventHandler()
    // Start the flooders after the LAN event handler is wired up.
    s.floodSegments(config)
    // Add a "static route" to the WAN Serf and hook it up to Serf events.
    if s.serfWAN != nil {
    if err := s.router.AddArea(types.AreaWAN, s.serfWAN, s.connPool); err != nil {

    a. Each AddArea call will eventually call maybeInitializeManager
    r.maybeInitializeManager(area, r.localDatacenter)
    if err := r.addServer(areaID, area, parts); err != nil {
    // addServer does the work of AddServer once the write lock is held.
    func (r *Router) addServer(areaID types.AreaID, area *areaInfo, s *metadata.Server) error {
    // Make the manager on the fly if this is the first we've seen of it,
    // and add it to the index.
    manager := r.maybeInitializeManager(area, s.Datacenter)

    b. maybeInitializeManager will fetch a rebalancer closure from grpcServerTracker for a given dc
    func (r *Router) maybeInitializeManager(area *areaInfo, dc string) *Manager {
    info, ok := area.managers[dc]
    if ok {
    return info.manager
    }
    shutdownCh := make(chan struct{})
    rb := r.grpcServerTracker.NewRebalancer(dc)
    manager := New(r.logger, shutdownCh, area.cluster, area.pinger, r.serverName, rb)

    c. go manager.Run() spins up a goroutine where the closure from 2b is called periodically

The result is that the exact same rebalancer closure for a given dc is being run in separate goroutines. If they overlap, r.clientConn.UpdateState is called in quick succession, resulting in the infamous WARN (related to #10603)

2022-12-16T11:10:43.150-0500 [WARN]  agent: [core][Channel #1 SubChannel #9] grpc: addrConn.createTransport failed to connect to {
  "Addr": "dc1-127.0.0.2:8300",
  "ServerName": "server2.dc1",
  "Attributes": null,
  "BalancerAttributes": null,
  "Type": 0,
  "Metadata": null
}. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:0->127.0.0.2:8300: operation was canceled"

On agent startup it is likelier that these WARNS will be emitted since the goroutines kick off at nearly the same time. Over time, the jitter in the rebalancer will cause the timings to drift apart but does not eliminate the chance to overlap.

@kisunji
Copy link
Contributor Author

kisunji commented Dec 16, 2022

Unsure what a clean solution would be in this case; does the manager need to be aware of the grpc server shuffling? If we can rip it out of the manager and do it at a higher level that isn't scoped to an AreaID that would be great.

@kisunji kisunji changed the title Unnecessary gRPC rebalancing leads to WARN logs on startup Unnecessary gRPC rebalancing leads to occasional WARN logs Feb 21, 2023
@kisunji
Copy link
Contributor Author

kisunji commented Feb 23, 2023

@kisunji kisunji closed this as completed Feb 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant