New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow specifying details flag to avoid extra REST calls #72
Conversation
Lines need to be a bit shorter, otherwise LGTM |
ff74a46
to
4196b38
Compare
@kmtaylor-github I bumped the batch size to 5000 as discussed. Is this what you had in mind? 50k seemed a little extreme and could add memory pressure. |
Remaining/existing CI failures are handled in a separate PR. |
tests/pyoozie/test_client.py
Outdated
mock_get.assert_called_with('jobs?jobtype=coordinator&offset=1&len=5') | ||
api._jobs_query(model.ArtifactType.Coordinator, limit=6000) | ||
mock_get.assert_any_call('jobs?jobtype=coordinator&offset=1&len=5000') | ||
mock_get.assert_any_call('jobs?jobtype=coordinator&offset=5001&len=5000') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test substantially different from / combinable with test_jobs_query_coordinator_pagination
?
pyoozie/tests/pyoozie/test_client.py
Lines 383 to 405 in 045fe6f
@mock.patch.object(model.Coordinator, 'fill_in_details', side_effect=lambda c: c, autospec=True) | |
def test_jobs_query_coordinator_pagination(self, _, api): | |
mock_results = iter( | |
[ | |
{ | |
'total': 5001, | |
'coordinatorjobs': [{'coordJobId': '1-C'}, {'coordJobId': '2-C'}] | |
}, | |
{ | |
'total': 5001, | |
'coordinatorjobs': [{'coordJobId': '3-C'}] | |
} | |
] | |
) | |
with mock.patch.object(api, '_get') as mock_get: | |
mock_get.side_effect = lambda url: next(mock_results) | |
result = api._jobs_query(model.ArtifactType.Coordinator) | |
assert len(result) == 3 | |
mock_get.assert_any_call('jobs?jobtype=coordinator&offset=1&len=5000') | |
mock_get.assert_any_call('jobs?jobtype=coordinator&offset=5001&len=5000') | |
with pytest.raises(StopIteration): | |
next(mock_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is testing pagination with and without an explicit limit. It's pretty subtle but the previous version of the limit code would effectively disable pagination by setting the chunk to be equal to the limit. This meant that if you set a limit of 20k then you'll request one chunk of 20k instead of X chunks of 500 up to the limit of 20k total results.
The change that I'm introducing will make sure that even a large limit will be paginated. This test checks for that by setting a limit of 6000
and checking that two REST calls were made with len=5000
(the new chunk size).
I could perhaps parametrize
the first test though 🤔
40aa194
to
0129272
Compare
Surface the
details
flag to avoid making additional REST calls (1 per item in the list) when the extra details are not necessary.