Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Cycle with start time is triggered multiple times after broker restart #9680

Closed
lzgabel opened this issue Jul 4, 2022 · 16 comments · Fixed by #10054
Closed

Time Cycle with start time is triggered multiple times after broker restart #9680

lzgabel opened this issue Jul 4, 2022 · 16 comments · Fixed by #10054
Assignees
Labels
blocker/stakeholder Marks an issue as blocked, waiting for stakeholder input kind/bug Categorizes an issue or PR as a bug version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0

Comments

@lzgabel
Copy link
Contributor

lzgabel commented Jul 4, 2022

Description

Hi team! I have a question about the last merge request #9418 and need to discuss expected behavior with the team.

When I set the timerCycle expression to R/2022-07-04T00:40:00+08:00/PT5M, after a period of execution stop the broker until 01:12:23 and restart the broker. After the process engine is replayed, does the record marked in the middle red box meet expectations?

<bpmn:startEvent id="StartEvent_1">
  <bpmn:outgoing>Flow_02haba2</bpmn:outgoing>
  <bpmn:timerEventDefinition id="TimerEventDefinition_06qnau9">
    <bpmn:timeCycle xsi:type="bpmn:tFormalExpression">R/2022-07-04T00:40:00+08:00/PT5M</bpmn:timeCycle>
  </bpmn:timerEventDefinition>
</bpmn:startEvent>

image

@lzgabel lzgabel added the kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. label Jul 4, 2022
@lzgabel lzgabel changed the title Discuss the expected behavior for timer Cycle after broker restart Discuss the expected behavior for timerCycle after broker restart Jul 4, 2022
@saig0 saig0 changed the title Discuss the expected behavior for timerCycle after broker restart Time Cycle with start time is triggered multiple times after broker restart Jul 4, 2022
@saig0 saig0 added kind/bug Categorizes an issue or PR as a bug team/process-automation and removed kind/toil Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc. labels Jul 4, 2022
@saig0
Copy link
Member

saig0 commented Jul 4, 2022

Let's clarify the expected behavior.

Assuming that the due date of the timer (with cycle-time and given start time) is exceeded and the next due date would be already exceeded.

Currently, the timer is triggered for each interval until it reaches a due date in the future.

Should we change the behavior to skip the intervals until the next due date that is not passed?

@korthout
Copy link
Member

korthout commented Jul 4, 2022

@saig0 It's easy to come up with an example that requires all intervals to have had a trigger at some point. For example, a business process to send some amount of money to a bank account in an interval. The use case will probably require that ALL the expected money has been sent at some point, even if a transaction couldn't be sent out at the original time.

It's much harder to come up with an example in favor of skipping the intervals from the past. Each example that I could think off, could be resolved by making the job worker idempotent. For example, on interval refilling something that empties should simply refill until full every time.

Unless we make it configurable for a user, I think most users will want to rely on Zeebe eventually triggering the event for all intervals that have passed.

@saig0
Copy link
Member

saig0 commented Jul 5, 2022

I checked the behavior of timer events of Camunda Platform 7.

It behaves differently in this situation. If the due date of a timer is in the past then it triggers the timer (i.e. create a new process instance) and schedules the next timer for the next due date in the future. It doesn't trigger the trimer for the intervals that are already in the past.

image

@lzgabel
Copy link
Contributor Author

lzgabel commented Jul 6, 2022

Hi @saig0 , @korthout . Is there a consensus within the team on the expected behavior for this issue? 🙇

@korthout
Copy link
Member

korthout commented Jul 7, 2022

@felix-mueller This behavior differs from C7, but it might be considered an improvement. For others (although I can't come up with use cases), it might come as a surprise. Can you share your opinion on this?

@korthout korthout added the blocker/stakeholder Marks an issue as blocked, waiting for stakeholder input label Jul 8, 2022
@lzgabel
Copy link
Contributor Author

lzgabel commented Jul 13, 2022

Hi guys. Is there any progress or conclusion on this issue? 🙇

@felix-mueller
Copy link
Member

@korthout Personally would also see it as an improvement and would be OK to do it differently than in C7 here.

@korthout
Copy link
Member

Whatever choice we make, we should document the chosen behavior so it's clear to users what to expect.

@korthout
Copy link
Member

@lzgabel to me, it feels like no decision has been made yet. Felix and I are in favor of the current behavior, but I'd be interested to hear more/other arguments.

If no further arguments are provided then I think we can safely choose the current behavior. So let's give it some more time.

@berndruecker
Copy link
Member

I honestly don't have a clear feeling about which behavior is more intuitive. Just quickly capturing my immediate thoughts:

  • I haven't found anything in the BPMN spec that is precise about the behavior - so I think it is up to us to define
  • I agree that business-wise it seems more correct to "catch up" with executions if the engine starts up.
  • At the same time, I doubt that the requirements of "regular payments" would be implemented this way. The timer start event is (to my feeling) used for periodic checks and stuff like this (so probably check if a regular payment has to be done right now). In that notion, running the process multiple times after an outage seems contra-intuitive.
  • That said, it might also be that in the past, people trusted the scheduling capabilities too little to do things like regular payments with it. But I actually doubt this will change.
  • I am a bit worried about developer & test systems that might be switched off for days - and then run a big bunch of timers when they have switched on again.
  • I wonder if we should compare this to cron expressions in a Linux system - as this is kind of well known. There, I don't think a Linux server would "catch up" when it was switched off for a while.

So in summary, I would tend towards the C7 behavior - but don't have good arguments, so from my perspective, we could also keep the current C8 behavior and document it properly.

Also pinging Mr. BPMN aka @falko if he has other thoughts

@falko
Copy link
Member

falko commented Aug 2, 2022

I would also tend towards the C7 behavior

@korthout
Copy link
Member

korthout commented Aug 2, 2022

I wonder if we should compare this to cron expressions in a Linux system

@berndruecker Interesting thought. Cron expressions work kinda similar to the interval but are able to express more complicated durations because it's a different format. For example, if you try to convert a CRON expression to a duration, then you can only do this at a given moment and determine the duration until the next hit. The duration after that next hit may differ from the duration until the first hit. In contrast, these intervals are always the same duration.

But, this entire issue is related to the combination of an interval with a start time. AFAIK, cron tools on Linux don't offer a way to define this start time. So they don't have this problem.

EDIT: This made me wonder if there are other scheduling systems that do allow this combination. I found one in Java: Timer.scheduleAtFixedRate, which uses the current C8 behavior. Note that this API is pretty old though... Example:

// prints 10 times "Triggered"
final Date startDateAndTime = Date.from(Instant.now().minus(Duration.ofMinutes(10)));
final var interval = Duration.ofMinutes(1);
final var task =
    new TimerTask() {
      @Override
      public void run() {
        System.out.println("Triggered");
      }
    };
new Timer().scheduleAtFixedRate(task, startDateAndTime, interval.toMillis());

Spring's Scheduled works different, because it uses an initial delay. So there the start cannot be in the past.

We could have a look at how Quartz deals with it, as it supports this combination. And it looks like this uses the C7 behavior. Which also pushes me to tend to the C7 behavior.

I would also tend towards the C7 behavior

@falko Could you please elaborate your reasoning?

@falko
Copy link
Member

falko commented Aug 2, 2022

I mostly see timer start events being used for Cron-like use case, i.e. the interval between the invocations is expected to be constant, e.g. daily, but there is no expectation of catching up with missed events, after the engine was down. If that is desired, one could start the process via API.

@korthout
Copy link
Member

korthout commented Aug 3, 2022

@saig0 Do you have an opinion on this topic?

@saig0
Copy link
Member

saig0 commented Aug 3, 2022

@korthout I have no strong option. I see the point from @berndruecker & @falko and also tend to the behavior of Camunda Platform 7.

@korthout
Copy link
Member

korthout commented Aug 3, 2022

So, I think most favor the C7 behavior, although there is no strong argument. Nonetheless, that sounds like a consensus to me.

Please speak up if you are against using the C7 behavior, within the next 7 days. If no counter-argument is provided, let's consider the current behavior to be defective and move forward with providing a fix.

@korthout korthout self-assigned this Aug 16, 2022
@Zelldon Zelldon added the version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0 label Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker/stakeholder Marks an issue as blocked, waiting for stakeholder input kind/bug Categorizes an issue or PR as a bug version:8.1.0-alpha5 Marks an issue as being completely or in parts released in 8.1.0-alpha5 version:8.1.0 Marks an issue as being completely or in parts released in 8.1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants