Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Native: Azure Machine Learning Service: Managed Network Provisions #3212

Open
Werner-Swart-83 opened this issue Apr 11, 2024 · 2 comments

Comments

@Werner-Swart-83
Copy link

Werner-Swart-83 commented Apr 11, 2024

Hello!

  • Vote on this issue by adding a 馃憤 reaction
  • If you want to implement this feature, comment to let us know (we'll work with you on design, scheduling, etc.)

Issue details

We are working with the Azure ML product team and have identified issues when creating Azure ML workspaces using managed isolated networking with Pulumi.

The scenario is as follows:

  1. We create all the storage account, key vaults and ACRs needed for the workspace.
  2. We create the workspace with additional outbound network rules (i.e. using Managed Isolated Networking)
  3. We then provision a compute cluster.

Because this is the fist time we create a compute cluster Azure ML will create the Managed Isolated Network and then join the compute. This is where the problem comes in: There are race conditions, as well as time outs that occur because ARM is trying to create the Managed Isolated Network, the outbound firewall, configuring the firewall as well as join the compute cluster.

MS has advised us to create the Managed Isolated Network separately, once that is created add the outbound rules and then only join the cluster.

For us to do that we need to add the Provision Managed Network endpoint to the v2-config.json file (if I understand the Contributing.md file correctly.)

MS is investigating the issue but it might take a long time before it is fix.

Affected area/feature

Automation API


I am happy to do the work if someone will help guide me :)

@Werner-Swart-83 Werner-Swart-83 added kind/enhancement Improvements or new features needs-triage Needs attention from the triage team labels Apr 11, 2024
@thomas11
Copy link
Contributor

Hi @Werner-Swart-83, thank you for the detailed report and for your offer to help.

Unfortunately, this endpoint isn't trivial to add. Pulumi operates on a resource-based model, whereas this endpoint is only a side-effecting operation on another resource (the workspace). That's why we don't auto-detect it currently.

Further, this operation of enabling the workspace doesn't seem to have an inverse disable operation. This makes it hard to model the stateful CRUD lifecycle.

We could potentially support this by adding a custom implementation for Workspace but that would take a meaningful amount of work.

At this point, I'd suggest using the REST API or an Azure SDK directly, probably within pulumi.apply(), to start this operation. I'm open to other suggestions, of course.

@thomas11 thomas11 added awaiting-feedback area/custom-resource and removed needs-triage Needs attention from the triage team labels Apr 12, 2024
@Werner-Swart-83
Copy link
Author

Werner-Swart-83 commented Apr 12, 2024

Thank you for your reply. The problem is we have to wait for the managed isolated network to be provisioned before we add the additional outbound network rules as well as the compute. We could do the rest call but we would need to poll until we got an answer back that the managed network is up. I don't think .apply is the best place to do that and if run preview we won't get the full extent of the things that will be created.

@pulumi-bot pulumi-bot added needs-triage Needs attention from the triage team and removed awaiting-feedback labels Apr 12, 2024
@thomas11 thomas11 added impact/missing-api and removed needs-triage Needs attention from the triage team labels Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants