Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: what restartPolicy is needed for? #209

Open
karpoftea opened this issue Jan 15, 2022 · 2 comments
Open

Question: what restartPolicy is needed for? #209

karpoftea opened this issue Jan 15, 2022 · 2 comments

Comments

@karpoftea
Copy link

Hi,
I've read crd docs but can't understand what does it mean when the job fails, thus can't understand what is the purpose of restartPolicy. Can you kindly explain why buildin checkpointing mechanics + HA is not enough to recover from failure and we need restartPolicy=FromSavepointOnFailure? If this property covers completely another case, please can you explain by example?
Thanks!

@bgeng777
Copy link

bgeng777 commented Jan 17, 2022

Hi Ilia, I just want to share my thoughts: I believe they are in different level:

  1. checkpointing + HA(for standalone cluster) is managed by Flink itself.
  2. restartPolicy is managed by this flink-on-k8s-operator(i.e. k8s) (codes can be found here.
    AFAIK, option1 should be enough if we configure it correctly like creating 2 JM and a zk service. Option2 is a good try to utilize k8s's potential. And due to the git history, it may be implemented pretty early when Flink's HA is not so good.

Besides, it is worthwhile to mention that Flink community also does some work in k8s HA like this. And since 1.12, Flink even supports native k8s HA. I am also interested in the question that if this operator can support such usage.

@karpoftea
Copy link
Author

Thanks for sharing!
I'm running flink 1.14 using this operator, it is per-job mode with 1 jm and k8s HA. I delete jm pod, k8s created new one and job continued to work from the place it stopped before. That lead me to ask a question about cases of restartPolicy usage. May be you are right and it is applicable to older versions of flink, but it's great to know for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants