You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Move the state machine that handles creation / deletion of Azure resources into its own class with its own state tracking and separate testing.
This lets us ensure this implementation is really bulletproof, while relying on it in the rest of the provider to simply "return a VM or fail trying" and "delete all resources associated with this VM".
The text was updated successfully, but these errors were encountered:
The big advantage to this refactor would be more easily testable code and secondarily more readable/modifyable code (say, if we find a need to create different resources later). The downside is introducing new bugs :)
There are a few "weird" things about this process that make it not a "normal" state machine:
state is both in the Azure API and in the workers DB table
some things can happen in parallel (disks) while others must be serialized (ip -> nic -> vm).
there are no "events" that transition from state to state -- we just poll the API
So, I think implementing this as a "normal" state machine with functions for each "state transition" would be difficult and not result in something especially readable or testable.
The current implementation of provisionResources reads something like
provision a NIC
if not done, return
provision an IP
if not done, return
provision a VM
if not done, return
on error, switch to removing the worker
which is a pretty good summary of what's happening. There are some callbacks and functions to modify the worker's provisionerData that don't read too clearly.
Another structure I could think of to handle this would be to set up nested function calls and use an exception to signal that execution is blocked and should be retried on the next poll. Something like
fn provisionResources():
try:
createVm()
on error = Blocked:
return
on any other error:
removeWorker
fn createVm():
if providerData.vm is complete:
return vm
if providerData.vm indicates creation is in progress:
if Azure API says creation is complete:
get information on disks and store it in providerData.vm
return vm
else:
raise blocked
nic = createNic()
call Azure API to create the VM with 'nic', store the operation in providerData.vm
raise blocked
fn createNic():
if providerData.nic is complete:
return nic
if providerData.nic indicates creation is in progress:
if Azure API says creation is complete:
return nic
else:
raise blocked
ip = createIp()
call Azure API to create the NIC with that IP, store the operation in providerData.nic
raise blocked
fn createIp():
if providerData.ip is complete:
return ip
if providerData.ip indicates creation is in progress:
if Azure API says creation is complete:
return ip
else:
raise blocked
call Azure API to create the IP, store the operation in providerData.ip
raise blocked
I'm not sure this is clearer than what we have! Also, the exception trick won't work too well with parallel operations. And, this is definitely no easier to test than what we have.
I'll think about this some more, but maybe what we have is good enough? Ideas?
Move the state machine that handles creation / deletion of Azure resources into its own class with its own state tracking and separate testing.
This lets us ensure this implementation is really bulletproof, while relying on it in the rest of the provider to simply "return a VM or fail trying" and "delete all resources associated with this VM".
The text was updated successfully, but these errors were encountered: