Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Coordinator Objects #41

Open
honkfestival opened this issue Apr 7, 2017 · 3 comments
Open

Sparse Coordinator Objects #41

honkfestival opened this issue Apr 7, 2017 · 3 comments

Comments

@honkfestival
Copy link

TL;DR - please consider separate result type for the Jobs API and the Job API

It's a little confusing right now for the Jobs and Job APIs to reuse the same objects (e.g. pyoozie.model.Coordinator).

In particular, I didn't realise that an empty conf property meant I needed to use get_coordinator_info from the Job API to have the conf property be populated.

I understand that the Oozie APIs work this way, but having separate types would make it easier to discover that you need a separate call to get all of the job details.

Mild suggestions:

  • CoordinatorReference and Coordinator
  • Coordinator and CoordinatorInfo
  • Coordinator and CoordinatorDetails

These might make the most sense in a porcelain-style section of the API with the plumbing still left for experts.

@cfournie

@kmtaylor-github
Copy link
Contributor

What call did you make that retrieved a coordinator with a null conf object? Most (all?) APIs should populate the details.

This whole scenario is a huge pain point. For example, the janky script I wrote for DD144 to pause all coordinators took ~15mins to run due to the extra overhead of fetching conf details for each of our ~700 coordinators; bypassing that would cut the time significantly.

The problem with the "different object" approach is that there is no clarity in the Oozie API about what is and isn't retrieved on any particular call. For example, (some|all|none) of the actions will be present based on the queries you did.

An alternate approach would be to lazily update these fields on access. We'd need a magic marker to distinguish between "dunno" and "nope".


On a somewhat related note, the current API approach for the JobAPI is very focused around ID/name, which means that many usecases will query coord/wf info repeatedly. Some API allow an optional coordinator kwarg and use that preferentially.

@honkfestival
Copy link
Author

I was using jobs_all_active_coordinators, which currently returns {} for the conf object.

The problem with the "different object" approach is that there is no clarity in the Oozie API about what is and isn't retrieved on any particular call.

Yeah, there are really two problems here:

  1. I couldn't tell if it was actually {} or if I was doing it wrong.
  2. Upon learning that I was doing it wrong, I then had to make n network calls to get the info I did want.

Lazy fields would have solved the first problem, but not the second. Unfortunately, I think it's on Oozie itself to solve the second one.

("pull requests welcome")

@kmtaylor-github
Copy link
Contributor

I was using jobs_all_active_coordinators, which currently returns {} for the conf object.

This was actually a bug (#47) but the issue as a whole remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants