-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ec2: Wait for describe_spot_instance_requests() #5401
Conversation
6805398
to
c1d9da7
Compare
Codecov Report
@@ Coverage Diff @@
## master #5401 +/- ##
==========================================
+ Coverage 91.75% 91.84% +0.09%
==========================================
Files 345 345
Lines 36861 36866 +5
==========================================
+ Hits 33820 33860 +40
+ Misses 3041 3006 -35
Continue to review full report at Codecov.
|
Is it possible to wait for the request to show up specifically instead of relying on describe_spot_instance_requests erroring out when it does not exist? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5s looks pretty low threshold.
I would have just put the try catch line 554 and make a fake status out of that exception.
81c37aa
to
c75ff0c
Compare
logic looks better. We still need to adapt the unit tests |
I assume some of these test failures are to be expected?:
|
@tardyp what is left to do to get this merged? |
hi @tonyhutter we had a regression in our CI due to a new warning by setuptools. It took me a while to fix it, that is why it took so long. sorry about that. I rebased you work to let the test run again. lets see what happen. |
@tonyhutter now, I can see that there are a few issues in flake8/pylint, and that the coverage of the diff is not very good. Can you please look at it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need a unit test which hooks that API to simulate the case you are trying to add. This is important for maintainance so that your feature is not destroyed when somebody else do some refactor on this code later.
Ok, I'll add it in |
@tardyp I was planning to use https://github.com/buildbot/buildbot/blob/master/master/buildbot/test/unit/test_worker_ec2.py#L392 @mock_ec2
def test_start_spot_instance(self):
c, r = self.botoSetup('latent_buildbot_slave')
amis = list(r.images.all())
product_description = 'Linux/Unix'
bs = ec2.EC2LatentWorker('bot1', 'sekrit', 'm1.large',
identifier='publickey',
secret_identifier='privatekey',
keypair_name='keypair_name',
security_name='security_name',
ami=amis[0].id, spot_instance=True,
max_spot_price=1.5,
product_description=product_description
)
bs._poll_resolution = 0
instance_id, _, _ = bs._start_instance()
instances = r.instances.filter(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
instances = list(instances)
self.assertTrue(bs.spot_instance)
self.assertEqual(bs.product_description, product_description)
self.assertEqual(len(instances), 1)
self.assertEqual(instances[0].id, instance_id)
self.assertIsNone(instances[0].tags) It should be calling: instance_id, _, _ = bs._request_sport_instance() ... to actually exercise the spot instance code. I assume that's because you don't want to rack up a bunch of bills with AWS running test cases. So I'm not sure how I should proceed testing my changes to |
hi @tonyhutter notice the @mock_ec2 decorator... We don't really call ec2, but we rather mock any call to them using this hangly mock_ec2 library. |
@tardyp thanks for the info, I now see what you're talking about (using boto3 as a fake EC2). Give me some time to put together a test case. |
hi. Its actually moto which does the mocking of boto. |
59dd5ca
to
4f68182
Compare
@tardyp I wrote a test case, but it appears
Is it ok to skip the test case? |
@tonyhutter I would suggest implementing the test and adding a skip condition at the beginning of the test, with it printing a warning of it due to it being unimplemented in moto. You can make it skip when it detects the function returns the |
I wonder if we could use the mock library to mock moto itself :-) Can we maybe just return some dummy data out of that function? |
Thanks all for the suggestions. @tardyp do you have a preference on how to proceed with the test case? |
I agree with @p12tic that for moto unimplemented methods, we should just used mock. skipping is not really an option, as it will just skip unconditionally, so the test is never run and adds no value |
Quick updates:
|
@tonyhutter thanks for update.. much appreciated! |
Let me know if I can help with getting this added to Moto or with the bugs in the other endpoints. (long-time Buildbot user. thank you for all of your work) |
@spulec I guess any help is appreciated. You can tell here that you are working on this branch in order to avoid double work, and then create a new PR a side to show your progress. |
Looks like there's an existing |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@tonyhutter Any progress here? |
I've been working on some higher priority projects for the last couple months and unfortunately this has fallen by the wayside. It's probably going to be at least a month or more before I can look at it again, but I do plan to get back to it eventually. |
4f68182
to
ab63bb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ab63bb5
to
15647bb
Compare
Just to summarize where this PR is at:
I've rebased this PR on master so if you want to pull it in without the test case, great. If not, no big deal, I'll just include this PR's patch in our project's buildbot repo, and apply it manually. I'd recommend you do pull it though since it's a fairly low-risk patch and fixes a real issue. |
@tonyhutter Do you have your work on moto where the problem exists available somewhere? Maybe someone here can help with that. |
@Conan-Kudo: Here's my attempt at test case for posterity (doesn't work): buildbot: https://gist.github.com/tonyhutter/63e1a8a0ae7cb442c685097969009083 |
I frequently see this error when starting up buildbot: An error occurred (InvalidSpotInstanceRequestID.NotFound) when calling the DescribeSpotInstanceRequests operation: The spot instance request ID 'sir-abcd1234' does not exist After the error, I'll see "zombie" instances running with no tags in AWS. This is caused by EC2LatentWorker._wait_for_request() calling describe_spot_instance_requests() before the request is ready. I noticed it can sometimes take a second or so for the request to show up. This patch waits for describe_spot_instance_requests() to return successfully.
15647bb
to
613a48a
Compare
@tonyhutter I've hacked missing support into moto as part as Buildbot tests. As part of that I had to change exception capturing to the following:
That is, we catch Could you please check if this still works with real ec2? If yes, then this PR is ready for merge, thanks a lot for the work you put in. |
@p12tic thanks, I'll give that a test |
@p12tic I just tested and verified that your code works. Do you want me to make the change in my PR to: except ClientError as e:
if 'InvalidSpotInstanceRequestID.NotFound' in str(e):
requests = None
else:
raise ... or just leave my PR as-is? |
@tonyhutter Actually I've already pushed this change to your branch, I just needed a confirmation that it works and will merge the PR soon. Thanks a lot! |
@p12tic thanks 👍 |
I frequently see this error when starting up buildbot:
After the error, I'll see "zombie" instances running with no tags in AWS. This is caused by
EC2LatentWorker._wait_for_request()
callingdescribe_spot_instance_requests()
before the request is ready. I noticed it can sometimes take a second or so for the request to show up.This patch waits up to five seconds for
describe_spot_instance_requests()
to return successfully.Contributor Checklist:
master/buildbot/newsfragments
directory (and read theREADME.txt
in that directory)