Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azeventhubs] Calling Run again on an already errored Processor causes a panic #22785

Closed
PaulBernier opened this issue Apr 26, 2024 · 5 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention This issue needs attention from Azure service team or SDK team

Comments

@PaulBernier
Copy link

PaulBernier commented Apr 26, 2024

Bug Report

  • When a Processor.Run return with an error, I thought I could just restart that processor by calling Run again. Doing so result in a panic:
panic: close of closed channel

goroutine 46 [running]:
github.com/Azure/azure-sdk-for-go/sdk/messaging/azeventhubs.(*Processor).initNextClientsCh(0xc0002cd380, {0xf1e1668?, 0xc0000be6e0?})
        /Users/pbernier/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/messaging/azeventhubs@v1.1.0/processor.go:290 +0x145
github.com/Azure/azure-sdk-for-go/sdk/messaging/azeventhubs.(*Processor).runImpl(0xc0002cd380, {0xf1e1668, 0xc0000be6e0})
        /Users/pbernier/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/messaging/azeventhubs@v1.1.0/processor.go:249 +0xc7
github.com/Azure/azure-sdk-for-go/sdk/messaging/azeventhubs.(*Processor).Run(0xc0001c4700?, {0xf1e1668, 0xc0000be6e0})
        /Users/pbernier/go/pkg/mod/github.com/!azure/azure-sdk-for-go/sdk/messaging/azeventhubs@v1.1.0/processor.go:228 +0x1d
  • What did you expect or want to happen?
  • Maybe I missed it but I did not see in doc/comments that a Processor was not supposed to be re-used, I want to confirm that the right way to "restart" is to create a new Processor?
  • Rather than an uncaught panic a dedicated error should be produced
  • Ideally the recovery process could be as simple as:
for {
	if err := processor.Run(processorCtx); err != nil {
		r.logger.Error("Processor error", zap.Error(err))
		continue
	}
	break
}
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. labels Apr 26, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl.

@jhendrixMSFT jhendrixMSFT removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. labels Apr 26, 2024
@github-actions github-actions bot added the needs-team-triage This issue needs the team to triage. label Apr 26, 2024
@jhendrixMSFT jhendrixMSFT removed the needs-team-triage This issue needs the team to triage. label Apr 26, 2024
@PaulBernier
Copy link
Author

A bit more context on the why I even want to restart the processor: I implemented a AWS DynamoDB checkpoint store, and the calls can be throttled for some time until autoscaling kicks in. CheckpointStore failures will propagate back to the processor causing it to exit. But those errors are recoverable, as the checkpoint store will be able to recover. I know that the Processor could also fail on non-recoverable errors, such as an incorrect ConsumerGroup for instance.

So the crux of what I am trying to solve here is really be able to restart the processor on CheckpointStore errors, if you can think of any proper handling of this.

@richardpark-msft
Copy link
Member

@PaulBernier, I'm looking at this to see how difficult it would be as it doesn't seem unreasonable at all.

Now, for any CheckpointStore, you'll probably just want to add a more forgiving retry policy. It's not really something you need or want to do at the Event Hubs level. Does the AWS DynamoDB client have a configurable retry policy? In Azure we have a RetryPolicy that can get passed into our clients:

type RetryOptions struct {
// MaxRetries specifies the maximum number of attempts a failed operation will be retried

@PaulBernier
Copy link
Author

PaulBernier commented Apr 29, 2024

Yes AWS DynamoDB client have retries, but sometimes it can take more time than we would like to adjust. I agreed with the direction of making the checkpointing more robust. But I still want to have a fail-safe approach, and restarting the Processor would be the last resort mean to revive the consumption process.

For now I am implementing that last resort restart by recreating a new Processor, and that's fine, just more verbose/tricky than my ideal Processor restart:

for {
	if err := processor.Run(processorCtx); err != nil {
		r.logger.Error("Processor error", zap.Error(err))
		time.Sleep(20 * time.Seconds)
		continue
	}
	break
}

The panic I pointed out is the main thing I'd expect to be fixed from this issue, any other improvement would just be a bonus :)

@richardpark-msft
Copy link
Member

@PaulBernier, the fix here was just to make the Processor indicate it's single-use only - once stopped, it can't be restarted.

It'll return proper errors now, with the release that's coming out next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Event Hubs needs-team-attention This issue needs attention from Azure service team or SDK team
Projects
None yet
Development

No branches or pull requests

3 participants