Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebNN EP] Move MLContext creation to a singleton #20600

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

egalli
Copy link

@egalli egalli commented May 7, 2024

Description

This PR moves the MLContext creation to a singleton. This enable us to share the same MLContext across multiple InterferenceSessions.

Motivation and Context

In order to enable I/O Binding with the upcoming MLBuffer API in the WebNN specification, we need to share the same MLContext across multiple sessions. This is because MLBuffers are restricted to the MLContext where they were created.

Copy link
Contributor

@Honry Honry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Galli, very good starting for WebNN I/O binding support!

js/web/lib/wasm/jsep/backend-webnn.ts Outdated Show resolved Hide resolved
js/web/lib/wasm/jsep/backend-webnn.ts Outdated Show resolved Hide resolved
@Honry
Copy link
Contributor

Honry commented May 8, 2024

@fs-eire, @guschmue, @fdwr, PTAL, thanks!

Copy link
Contributor

@Honry Honry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM % a nit, I will test it and let you know if there's any other issues. :)

js/web/lib/wasm/jsep/backend-webnn.ts Outdated Show resolved Hide resolved
@fs-eire
Copy link
Contributor

fs-eire commented May 9, 2024

The current implementation has a few problems:

  • If user specify multiple execution providers (which is legit in ORT) like ['webgpu', { name: 'webnn', powerPreference: 'high-performance' }], it is hard to tell from JavaScript code that which EP is currently in use. As long as WebNN is initialized, JavaScript has no idea which EP will actually work with the current model - and by reading from execution providers config is not sufficient to tell.

  • The current implementation splited the code of initialization process into multiple places. The C++ code does a few things and the Javascript does some others. The [webnn session option] to [context ID] is a 1:1 mapping, and the [session ID] to [context ID] is a multi-to-one mapping. In my understanding, using a singleton map in C++ should be a better way to implement this requirement. This is because it's much easier to read and understand and putting all related code together if possible to reduce the chance to introduce bugs in future changes. Please let me know if I understand this part wrong.

  • Modifying the user input (session options) is a concern. This is usually not an expected behavior however the current implementation depends on adding a property to a user specified session options.

In order to enable I/O Binding with the upcoming MLBuffer changes to
the WebNN specification, we need to share the same MLContext across
multiple sessions. This is because MLBuffers are tied to the MLContext
where they were created.
@egalli egalli changed the title [WebNN EP] Move MLContext creation to TypeScript [WebNN EP] Move MLContext creation to a singleton May 10, 2024
@egalli
Copy link
Author

egalli commented May 10, 2024

I have moved the context de-duplication to a singleton in C++.

}
};

namespace onnxruntime {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated namespace onnxruntime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not duplicated, the onnxruntime namespace ends on line 47. This is because it template specialization
for std::hash<::onnxruntime::WebNNContextOptions> must be declared inside the std namespace and before it is used on line 107 by InlinedHashMap<WebNNContextOptions, emscripten::val> contexts_;.

ORT_THROW("Failed to get ml from navigator.");
}

emscripten::val context = ml.call<emscripten::val>("createContext", options.AsVal()).await();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how do you get global shared context from JS to create ml buffer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, this PR does not expose the context to JS. We'll need access to MLContext from JS to enable ort-web to download and upload data to the MLBuffer from JS.

@fs-eire do you have any suggestion on how to solve this issue without adding a WebNN TS backend?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the Embind API, but I think the idea is the same:

Using the EMSCRIPTEN_BINDINGS() macro to export C++ functions to the Module object. For example, a function called getCurrentMLContext (takes no parameter) or getMLContext (takes session ID as parameter) and returns the reference of the object. Then, you can call Module['getCurrentMLContext']() in js_internal_api.js (because this file is included in the final JS glue using --pre-js in emcc.

Please see also:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the actual usage of IO binding, users may need to access the MLContext object via ORT javascript API. It can be something like:

const mySession = await ort.InferenceSession.create('...', { executionProviders: [{
  name: 'webnn',
  ...
}] });

const myMLContext = ort.env.webnn.getContext(mySession);

Considering this, there need to be a [SessionObject(JS)] to [MLContext(JS)] mapping in JavaScript. We can get [SessionObject(JS)] to [SessionID(JS&C++)] mapping, and [SessionID(JS&C++)] to [MLContext(JS)] mapping, so this is able to implement.

Copy link
Author

@egalli egalli May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, there is no public API to get the WebNNExecutionProvider* from the [SessionID(JS&C++)](i.e. OrtSession*) and Module.jsepSessionState.sessionHandle is only valid during run()/Compile().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. After checking the code, I realize that using session ID is not a good idea. There is no existing places that we can associate a session handle with a Web NN EP instance in the code.

In C++ we can just expose the WebNNContextManager::GetContext() to JavaScript.

And in JavaScript API, we have 2 options:

  • Let user pass options directly:

    const myWebNNOptions = {
      name: 'webnn',
      ...
    };
    const mySession = await ort.InferenceSession.create('...', {
      executionProviders: [myWebNNOptions ]
    });
    
    const myMLContext = ort.env.webnn.getContext(myWebNNOptions);
    • No need to add extra JS code
  • Let user pass session object:

    const mySession = await ort.InferenceSession.create('...', { executionProviders: [{
      name: 'webnn',
      ...
    }] });
    
    const myMLContext = ort.env.webnn.getContext(mySession);
    • May be little bit easier to use for users, but require maintain a session-to-webnnOptions map in JS

A few changes to init.ts, backend-webnn.ts may have to add back, as we need to allow to assign object ort.env.webnn to a property of Module so that in C++ embind can access the object and add a property ( the getContext() function) to it.

Comment on lines +23 to +25
std::optional<std::string> device_type;
std::optional<int> num_threads;
std::optional<std::string> power_preference;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is there a consideration why using string instead of enum types?

  2. What about default value? For example, according to spec, if deviceType is not specified, it is considered 'cpu'. However, an instance of WebNNContextOptions with deviceType==="cpu" is considered a different option of not setting deviceType, although they should be the same.

@fs-eire
Copy link
Contributor

fs-eire commented May 20, 2024

Add a few comments here:

There is a new issue (#20729) reveals a clearer picture of how an actual requirement would be. Users may want to manipulate with the MLContext with more flexibility. I am currently thinking about it may be a good idea to let user to create the MLContext and just pass it to ORT via session options.

Considering the latest spec: https://www.w3.org/TR/webnn/#api-ml-createcontext

There will be a webnn-webgpu interop and createContext() may accept an WebGPU gpuDevice object. This will be even more difficult to implement inside ORT so just let users to do their part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants