Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand ONNX schema registration/deregistration API #5221

Merged
merged 11 commits into from
Jun 5, 2023

Conversation

q-ycong-p
Copy link
Contributor

Description

  • Introduces functions to deregister all schema of a domain. This clears map<OperatorSetVersion, OpSchema>, which should free up memory allocated for the schemas.

Motivation and Context

Pull#3266 introduced handle for runtime to specify opset to register. Similarly, it makes sense for a runtime to "offload" the schemas when it's not needed. This aims to (1) reduce runtime memory, and (2) improve the feature completeness for a runtime to control ONNX schema registration as it needs.

@jcwchen jcwchen added run release CIs Use this label to trigger release tests in CI utility labels May 18, 2023
@q-ycong-p q-ycong-p requested a review from a team as a code owner May 18, 2023 21:21
…ions

Signed-off-by: Yu Cong <congyc@amazon.com>
@q-ycong-p
Copy link
Contributor Author

Hi @jcwchen , I've updated the commit to:

  1. Allow the registrations of multiple schema versions via the existing RegisterOnnxOperatorSetSchema API. The behavior is as we discussed in Selective load ONNX schema by specific opset_version #3266, that the final schema follows: Registration(VA, VB) == Registration(VA) U Registration(VB).
  2. Keep the de-registration API DeregisterOnnxOperatorSetSchema to deregister all schemas, instead of allowing a specific version. This is because the later appears tricky to do right - when deregistering a version VX, it's not straight forward to identify which schema version should be kept/removed at ONNX level. For example, a runtime registers opset-1, which adds RandomNormal-1 (t0), then if it registers opset-7 and immediately de-register opset-7 (t1), we'd expect the schema state now (t1) to be equivalent to t0. However I don't see a quick way for ONNX to check whether RandomNormal-1 was brought in with opset-7 hence should be removed, or brought in by previous registrations hence should be kept. I think the current API to de-register all is the easiest, for both ONNX to implement, and for runtime to understand. Pls let me know if I understood wrong here.
  3. Add unit tests to cover several API call cases from a runtime perspective.

I have several remaining questions. Would really like to get your thoughts on!

  • Since the schema map is a static object, and that a runtime may run multi-threaded, do we need mutex to guard operations on the map?
  • Current schema registration handles are exposed to runtimes via C++ API. How can we expose the some handles to the Python users?

@q-ycong-p q-ycong-p changed the title Add handle to deregister ONNX opset schema Expand ONNX schema registration/deregistration API May 18, 2023
@q-ycong-p
Copy link
Contributor Author

Also, I think it makes sense to have a static bool variable, e.g. disable_onnx_static_registration which turns true/false based on #ifndef __ONNX_DISABLE_STATIC_REGISTRATION during compilation. It can be exposed to runtime by an additional func like ONNX_NAMESPACE::IsONNXStaticRegistrationDisabled(). The motivation is that a runtime may approach the freedom of schema registration differently based on how this compilation flag was initially set. @jcwchen , what do you think?

onnx/defs/operator_sets.h Show resolved Hide resolved
onnx/defs/operator_sets.h Outdated Show resolved Hide resolved
onnx/defs/schema.h Show resolved Hide resolved
… on duplicate

Signed-off-by: Yu Cong <congyc@amazon.com>
@q-ycong-p
Copy link
Contributor Author

q-ycong-p commented May 19, 2023

Hi @jcwchen thanks for the review. I've addressed your suggestion to expand user option. Would appreciate an approval if codes look ready. Thanks a ton!

My two remaining questions are: (1) Current schema registration handles are exposed to runtimes via C++ API. How can we expose the some handles to the Python users?
(2) Shall we add user API to reflect the compilation flag ONNX_DISABLE_STATIC_REGISTRATION? See explanation here.
(for these two points, I can follow up in separate PR. Just wanna take your thoughts!)

@q-ycong-p q-ycong-p requested a review from jcwchen May 19, 2023 23:05
@jcwchen
Copy link
Member

jcwchen commented May 19, 2023

My two remaining questions are: (1) Current schema registration handles are exposed to runtimes via C++ API. How can we expose the some handles to the Python users?
(2) Shall we add user API to reflect the compilation flag ONNX_DISABLE_STATIC_REGISTRATION? See explanation #5221 (comment).
(for these two points, I can follow up in separate PR. Just wanna take your thoughts!)

  1. Technically we can directly use pybind11 to expose Python API from C++ API. Here is a simple example PR: https://github.com/onnx/onnx/pull/4720/files.
  2. My first thought would be will exposing ONNX_DISABLE_STATIC_REGISTRATION complicate runtime too much? I can imagine much different code will be used with/without ONNX_DISABLE_STATIC_REGISTRATION during runtime, but if there is really a real-world use case for dynamic setting ONNX_DISABLE_STATIC_REGISTRATION, I am fine to have such an API.

Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I think I miss some comments while submitting comments yesterday. Please let me know if I still miss something important. Thank you!

Since the schema map is a static object, and that a runtime may run multi-threaded, do we need mutex to guard operations on the map?

Good point. I think it's good to have from std library.

@q-ycong-p
Copy link
Contributor Author

q-ycong-p commented May 22, 2023

My two remaining questions are: (1) Current schema registration handles are exposed to runtimes via C++ API. How can we expose the some handles to the Python users?
(2) Shall we add user API to reflect the compilation flag ONNX_DISABLE_STATIC_REGISTRATION? See explanation #5221 (comment).
(for these two points, I can follow up in separate PR. Just wanna take your thoughts!)

  1. Technically we can directly use pybind11 to expose Python API from C++ API. Here is a simple example PR: https://github.com/onnx/onnx/pull/4720/files.
  2. My first thought would be will exposing ONNX_DISABLE_STATIC_REGISTRATION complicate runtime too much? I can imagine much different code will be used with/without ONNX_DISABLE_STATIC_REGISTRATION during runtime, but if there is really a real-world use case for dynamic setting ONNX_DISABLE_STATIC_REGISTRATION, I am fine to have such an API.
  1. Makes sense. I plan to follow up a separate PR to expose it to Python API once current feature PR is merged.
  2. Makes sense. My concern originated from that the map() func that get called at multiple places (e.g. in the getter Schema(...)) declares static SchemaRegisterer here based on __ONNX_DISABLE_STATIC_REGISTRATION's status. This makes me think, when this cmake args is OFF, a runtime should probably not mess with schema registration at all. But I experimented with a unit test, that even when this cmake arg is OFF, it can deregister, query, and register schema without issue. Pls correct me if I'm missing something here. I'll add this UT in PR.

Found another issue, loaded_schema_version here should probably be assigned based on __ONNX_DISABLE_STATIC_REGISTRATION, no? Like:

#ifdef __ONNX_DISABLE_STATIC_REGISTRATION
  int OpSchemaRegistry::loaded_schema_version = -1;
#else
  int OpSchemaRegistry::loaded_schema_version = 0;
#endif

…TIC_REGISTRATION is OFF

Signed-off-by: Yu Cong <congyc@amazon.com>
@q-ycong-p
Copy link
Contributor Author

Sorry I think I miss some comments while submitting comments yesterday. Please let me know if I still miss something important. Thank you!

Since the schema map is a static object, and that a runtime may run multi-threaded, do we need mutex to guard operations on the map?

Good point. I think it's good to have from std library.

I thought of having a OpSchemaRegistry-level mutex, which guards the critical sections in OpSchemaRegisterOnce and OpSchemaDeregisterAll that read/write the static OpSchema map. But OpSchemaRegisterOnce is a class inside OpSchemaRegistry. I am a bit confused what's the right mutex impl here. Could you shed some light? We can also merge the current feature PR and iterate with separate PR on the good-to-have features: (1) thread-safety (2) expose to Python API.

onnx/defs/schema.h Outdated Show resolved Hide resolved
onnx/defs/schema.cc Outdated Show resolved Hide resolved
…TRATIION flag handle to python setup.py

Signed-off-by: Yu Cong <congyc@amazon.com>
onnx/defs/schema.h Outdated Show resolved Hide resolved
Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern originated from that the map() func that get called at multiple places (e.g. in the getter Schema(...)) declares static SchemaRegisterer here based on __ONNX_DISABLE_STATIC_REGISTRATION's status. This makes me think, when this cmake args is OFF, a runtime should probably not mess with schema registration at all. But I experimented with a unit test, that even when this cmake arg is OFF, it can deregister, query, and register schema without issue. Pls correct me if I'm missing something here. I'll add this UT in PR.

Thank you for elaboration! I got your point now. I think that would be helpful for runtime and so it's good to have.

onnx/defs/schema.h Outdated Show resolved Hide resolved
onnx/defs/schema.cc Outdated Show resolved Hide resolved
@q-ycong-p
Copy link
Contributor Author

q-ycong-p commented May 24, 2023

Looks like the release build's ORT tests failed (not sure why it didn't run before) on "Verify ONNX with ONNX Runtime PyPi package". The failed tests were due to ONNX schema not registered - the ONNX build must have turned on -DONNX_DISABLE_STATIC_REGISTRATION. I took a look but not fully understanding the CI pipeline here. Any thought?

Do we expect the release build to permute to all available cmake bool flags? Or else how did it got turned on in the release tests?

[Note: the commit 641cd5a doesn't address the rel build ORT tests failures - i haven't figured out. Help appreciated from devs!!]

…ATION. Default fail_duplicate_schema_registration to true. Add ONNX_DISABLE_STATIC_REGISTRATION to summary.cmake and flush indentation.

Signed-off-by: Yu Cong <congyc@amazon.com>
Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the release build's ORT tests failed (not sure why it didn't run before) on "Verify ONNX with ONNX Runtime PyPi package". The failed tests were due to ONNX schema not registered - the ONNX build must have turned on -DONNX_DISABLE_STATIC_REGISTRATION. I took a look but not fully understanding the CI pipeline here. Any thought?

Those release CIs will be triggered only when having run release CIs (I added it recently because this PR is important and I would like to test it in advance). IIUC, this PR didn't change the default behavior for schema registration (ONNX_DISABLE_STATIC_REGISTRATION=OFF and it will register all opset schema). I guess the failure might be irrelevant to this PR. Please let me merge main branch with this PR to see whether it can help resolve those failures.

onnx/defs/schema.cc Outdated Show resolved Hide resolved
@jcwchen
Copy link
Member

jcwchen commented May 26, 2023

Those release CIs will be triggered only when having run release CIs (I added it recently because this PR is important and I would like to test it in advance). IIUC, this PR didn't change the default behavior for schema registration (ONNX_DISABLE_STATIC_REGISTRATION=OFF and it will register all opset schema). I guess the failure might be irrelevant to this PR. Please let me merge main branch with this PR to see whether it can help resolve those failures.

Such an error happened in other PRs as well. Probably because onnxruntime just has a fresh release (1.15.0) recently. We will have a PR to fix these failures and please just ignore those failures in release CIs for now. Thanks!

(Update) The PR to resolve those failures is here: #5261.

…s from tests.

Signed-off-by: Yu Cong <congyc@amazon.com>
@q-ycong-p
Copy link
Contributor Author

Those release CIs will be triggered only when having run release CIs (I added it recently because this PR is important and I would like to test it in advance). IIUC, this PR didn't change the default behavior for schema registration (ONNX_DISABLE_STATIC_REGISTRATION=OFF and it will register all opset schema). I guess the failure might be irrelevant to this PR. Please let me merge main branch with this PR to see whether it can help resolve those failures.

Such an error happened in other PRs as well. Probably because onnxruntime just has a fresh release (1.15.0) recently. We will have a PR to fix these failures and please just ignore those failures in release CIs for now. Thanks!

(Update) The PR to resolve those failures is here: #5261.

Yes, the Windows-CI failures log complains on the tests that involve the missing ops #5261 adds. I am not sure whats' the common practice here. Do we wait on #5261 to merge, then update this one to run release tests before move forward with review? Otherwise I think all comments are addressed and it's in state for review/merge. Thanks!

@q-ycong-p
Copy link
Contributor Author

It looks like that the only failing Lint test failed due to a Lint installation hiccup. Other PRs seems to be fine with the Lint check. Should/How can we rerun the Lint checker?

Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it passed all CIs. It will still need approval from @onnx/sig-operators-approvers. It might take some time because some of them are OOF this week. Thank you for waiting.

@gramalingam gramalingam enabled auto-merge (squash) June 5, 2023 16:53
@gramalingam gramalingam merged commit d631800 into onnx:main Jun 5, 2023
58 checks passed
@q-ycong-p q-ycong-p deleted the schema_deregisttration branch June 5, 2023 21:55
@q-ycong-p
Copy link
Contributor Author

Hi @jcwchen , thank you very much for working with me on this. I followed up a separate PR #5291 aiming to address schema map thread-safety based on our discussion above. Would appreciate your suggestion if I miss something there - Ty!

@@ -49,6 +49,9 @@
ONNX_NAMESPACE = os.getenv("ONNX_NAMESPACE", "onnx")
ONNX_BUILD_TESTS = bool(os.getenv("ONNX_BUILD_TESTS") == "1")
ONNX_DISABLE_EXCEPTIONS = bool(os.getenv("ONNX_DISABLE_EXCEPTIONS") == "1")
ONNX_DISABLE_STATIC_REGISTRATION = bool(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a place where these options are documented?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any, although these options are identical as the options in CMakeLists.txt:

onnx/CMakeLists.txt

Lines 25 to 28 in 391f570

option(ONNX_BUILD_TESTS "Build ONNX C++ APIs Tests" OFF)
option(ONNX_USE_LITE_PROTO "Use lite protobuf instead of full." OFF)
option(ONNX_DISABLE_EXCEPTIONS "Disable exception handling." OFF)
option(ONNX_DISABLE_STATIC_REGISTRATION "Disable static registration for onnx operator schemas." OFF)

which has a simple sentence as description.

adityagoel4512 pushed a commit to adityagoel4512/onnx that referenced this pull request Jul 28, 2023
Signed-off-by: Yu Cong <congyc@amazon.com>
Co-authored-by: Yu Cong <congyc@amazon.com>
Co-authored-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Aditya Goel <agoel4512@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run release CIs Use this label to trigger release tests in CI utility
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants