Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Watchdog #337

Open
numeralnathan opened this issue Mar 4, 2022 · 2 comments
Open

Create Watchdog #337

numeralnathan opened this issue Mar 4, 2022 · 2 comments

Comments

@numeralnathan
Copy link

In the hardware world, there is a concept of a watchdog. The rest of the hardware or software has to periodically state that is operating properly. If the hardware or software does not do this, then the watchdog kicks in and does some operation to get things running again. This could be rebooting the machine or restarting the software.

In the software monitoring world for servers, the monitor will call some health check API and the server will reply with a healthy or sick. If the server responds with sick, then an alarm goes off. In Kubernetes, the server is quiesced (i.e., no new requests) and eventually shutdown while another server is started in the cluster.

These are large watchdogs that affect the entire program. I would like a watchdog at a task level. This is useful for a long running task. As the task runs, it reports back to the watchdog that it is still making progress. If the task does not report back within the timeout, then the watchdog times out.

public class WatchDogExecutor
{
   private static final ThreadLocal<List<WatchDogExecutor>> EXECUTORS;

   private volatile Instant m_lastHeartbeat;

   public static void heartbeat()
   {
      EXECUTORS.
         get().
         forEach(WatchDogExecutor::updateHeartbeat);

      // If there are no WatchDogExecutors, do we want to throw an IllegalStateException?  Maybe or maybe not.  I think this needs to be something the programmer can opt-in or opt-out.
   }

   private void updateHeartbeat()
   {
      m_lastHeartbeat = Instant.now();
   }
}

The above WatchDogExecutor tracks what thread is currently executing via modifying EXECUTORS (not shown). As a task is running it needs to call WatchDogExecutor.heartbeat() before the configured timeout (not shown). Another background repeating task needs to go through all the WatchDogExecutors and enforce any time outs.

Why not just use a Timeout? Because the task is very long running, this timeout might be several minutes or longer. With a watch dog, the task can timeout much sooner if it hangs from the very beginning.

I'll admit I have only needed this kind of watch dog a few times in my career. However, having the functionality readily available will encourage greater use.

@magicprinc
Copy link

The only problem I see: If you have such a rogue task, what you can do with it? You can't actually stop execution in Java. Thread.stop is deprecated, Thread.interrupt is only a hint. In the CompletableFuture they have understood it themselves, and you can't really cancel the CompletableFuture :-)

@numeralnathan
Copy link
Author

@magicprinc You can't stop its execution. But, you can set a flag to tell it to quit. You can set a flag to invalidate any result it produces. You can log an error. You tell the user the operation timed out. The world is your playground of how to respond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants