Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TrackingEventProcessors shutdown is not working correctly in 4.5.13 #2287

Closed
nils-christian opened this issue Jul 14, 2022 · 9 comments
Closed
Assignees
Labels
Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. Status: Resolved Use to signal that work on this issue is done. Type: Bug Use to signal issues that describe a bug within the system.

Comments

@nils-christian
Copy link
Contributor

nils-christian commented Jul 14, 2022

Hi,

We switched from Axon 4.5.9 to 4.5.13 due to issue #1981. However, we noticed that the shutdown of the TrackingEventProcessors doesn't seem to work as expected anymore.

Basic information

  • Axon Framework version: 4.5.13
  • JDK version: 11

Steps to reproduce

I use almost the same demo program as for #1981:

@SpringBootApplication
public class DemoApplication {

	public static void main( final String[] args ) {
		final var applicationContext = new SpringApplicationBuilder( DemoApplication.class )
				.properties( "axon.axonserver.enabled=false" )
				.run( args );

		final var eventGateway = applicationContext.getBean( EventGateway.class );
		eventGateway.publish( 0 );

		await( ).until( ( ) -> MyEventHandler.elementProcessed.get( ) );

		applicationContext.close( );
	}

	@Autowired
	private void configure( final Configurer configurer ) {
		configurer.configureLifecyclePhaseTimeout( 30, TimeUnit.SECONDS );
	}

	@Autowired
	private void configure( final EventProcessingConfigurer configurer ) {
		configurer.registerTrackingEventProcessorConfiguration( MyEventHandler.PROCESSING_GROUP, c -> TrackingEventProcessorConfiguration
				.forParallelProcessing( 4 )
				.andEventAvailabilityTimeout( 20, TimeUnit.SECONDS ) );
	}

	@Component
	@ProcessingGroup( MyEventHandler.PROCESSING_GROUP )
	static class MyEventHandler {

		public static final String PROCESSING_GROUP = "MyEventHandler";

		private static AtomicBoolean elementProcessed = new AtomicBoolean( );

		@EventHandler
		public void on( final Integer i ) {
			elementProcessed.set( true );
		}

	}

}

Notice that I configured the lifecycle phase timeout. What happens is, that the timeout in DefaultConfigurer.invokeLifecycleHandlers is not exceeded (no TimeoutException occurs). But we still get a lot of exceptions afterwards in the log, as the event processor is still running after the database has been shut down. My workaround from #1981 won't work anymore as well, as it basically does the same thing as the modified code in DefaultConfigurer.invokeLifecycleHandlers.

We also checked the same code with 4.6.0-SNAPSHOT (the current master at f496fed) - in this case the exceptions do not occur and the event processor is correctly stopped.

We are not entirely sure if this is an existing bug or maybe the code which was merged for #1981 relies on some other code on the master.

Expected behaviour

The event processor is no longer running after the shutdown lifecycle handlers have run.

Actual behaviour

The event processor is still running after the shutdown lifecycle handlers have run which leads to exceptions.

@nils-christian nils-christian added the Type: Bug Use to signal issues that describe a bug within the system. label Jul 14, 2022
@smcvb
Copy link
Member

smcvb commented Jul 14, 2022

Hey Nils, curious, but did you test out whether the shutdown process worked with 4.6.0-SNAPSHOT?
In favor of getting 4.6.0 done (instead of again moving to patch releases) I'd like to rule out any effort of moving through commits from a half year ago to figure out where the discrepancy lies; if any at all.

If you could validate for me whether 4.6.0-SNAPSHOT does solve the predicament, that'll safe us a ton of time on our end.
Thanks!

@nils-christian
Copy link
Contributor Author

Hi @smcvb,

Yes, the above code works with 4.6.0-SNAPSHOT (f496fed) as expected. The shutdown occurs and no exceptions are visible.

@smcvb
Copy link
Member

smcvb commented Jul 14, 2022

Awesome, thanks for the confirmation, @nils-christian.

@smcvb smcvb self-assigned this Jul 14, 2022
@smcvb smcvb added Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. Status: In Progress Use to signal this issue is actively worked on. labels Jul 14, 2022
@smcvb
Copy link
Member

smcvb commented Jul 14, 2022

I have a hunch I found the predicament.
One of the commits that belonged to the TEP shutdown wasn't carrying the issues number.

Hence I missed it.

I believe this to be commit 696d8f5, which I've just cherry-picked to axon-4.5.x.
The commit checks whether the TEP threads have stopped during shutdown and if that didn't happen, they're interrupted.

I'll start making a 4.5.14-SNAPSHOT soon.
Once it's present, would you be able to validate whether that solves the predicament, @nils-christian?

@smcvb smcvb added this to the Release 4.5.14 milestone Jul 14, 2022
@nils-christian
Copy link
Contributor Author

Hi @smcvb,

I just built f6e33b5 locally and it looks much better. The shutdown is performed correctly. I now get something like the following:

2022-07-14 10:18:07.958 INFO 23468 --- [ main] o.a.e.TrackingEventProcessor : Shutdown state set for Processor 'MyEventHandler'.
2022-07-14 10:18:07.958 INFO 23468 --- [ main] o.a.e.TrackingEventProcessor : Processor 'MyEventHandler' awaiting termination...
2022-07-14 10:18:12.983 INFO 23468 --- [EventHandler]-3] o.a.e.TrackingEventProcessor : Released claim
2022-07-14 10:18:12.983 INFO 23468 --- [EventHandler]-2] o.a.e.TrackingEventProcessor : Released claim
2022-07-14 10:18:12.983 INFO 23468 --- [EventHandler]-0] o.a.e.TrackingEventProcessor : Released claim
2022-07-14 10:18:12.983 INFO 23468 --- [EventHandler]-1] o.a.e.TrackingEventProcessor : Released claim
2022-07-14 10:18:12.984 INFO 23468 --- [EventHandler]-0] o.a.e.TrackingEventProcessor : Worker for segment Segment[3/3] stopped.
2022-07-14 10:18:12.984 INFO 23468 --- [EventHandler]-2] o.a.e.TrackingEventProcessor : Worker for segment Segment[1/3] stopped.
2022-07-14 10:18:12.984 INFO 23468 --- [EventHandler]-1] o.a.e.TrackingEventProcessor : Worker for segment Segment[0/3] stopped.
2022-07-14 10:18:12.984 INFO 23468 --- [EventHandler]-3] o.a.e.TrackingEventProcessor : Worker for segment Segment[2/3] stopped.
2022-07-14 10:18:12.987 INFO 23468 --- [ main] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2022-07-14 10:18:12.987 INFO 23468 --- [ main] .SchemaDropperImpl$DelayedDropActionImpl : HHH000477: Starting delayed evictData of schema as part of SessionFactory shut-down'
2022-07-14 10:18:12.994 INFO 23468 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2022-07-14 10:18:12.996 INFO 23468 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.

No exceptions occur afterwards.

@smcvb
Copy link
Member

smcvb commented Jul 14, 2022

That's great news!
That means we likely have caught the issue.
Honestly, I'm happy we found it as quick as we did 😅.

@smcvb
Copy link
Member

smcvb commented Jul 14, 2022

So, how do you feel, @nils-christian.
Shall I close this issue as resolved because of cherry picking 696d8f5?

@nils-christian
Copy link
Contributor Author

I am happy too. As you know we are looking very forward to the fix for #1981 :-)

Yes, we can close this issue.

@smcvb smcvb added Status: Resolved Use to signal that work on this issue is done. and removed Status: In Progress Use to signal this issue is actively worked on. labels Jul 14, 2022
@smcvb
Copy link
Member

smcvb commented Jul 14, 2022

So, 4.5.14 has been released. Be sure to let me know if any other issues pop up, @nils-christian!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. Status: Resolved Use to signal that work on this issue is done. Type: Bug Use to signal issues that describe a bug within the system.
Projects
None yet
Development

No branches or pull requests

2 participants