Reuse mocha tests for web platform tests #2

huningxin · 2020-09-09T02:42:03Z

Raised by @anssiko in 3 September 2020 WebML CG meeting,

anssik: interested in figuring out if we can reuse mocha tests for w-p-t tests

BruceDai · 2020-09-14T02:12:53Z

@huningxin @anssiko I converted WebNN add and mul tests using Web Platform Tests testharness.js framework from such add tests and mul tests of @huningxin 's PR #1. My converted tests are on this link, and the report of these tests goes here, PTAL.

Following Writing Tests guidelines of Web Platform Tests, I did three small modifications to convert from @huningxin 's mocha tests.

Created test html page, then imported both testharness.js and testharnessreport.js scripts.
Used testharness.js promise_test interface instead of mocha it interface.
Used testharness.js assertions instead of chai assertions in mocha tests.

What're your opinions? Thanks.

anssiko · 2020-09-15T10:34:15Z

Thanks @BruceDai! Given the conversion from mocha to w-p-t test harness seemed to be a reasonable effort, I'd recommend we focus on w-p-t tests that are reusable in the future standardization phases (in particular, latest at CR transition).

@wchao1115 might be interested in taking a look at your converted tests, since he is familiar with conformance testing of ML operations.

Use node es6 modules support

Fixup rename

BruceDai · 2021-07-06T05:21:32Z

I will implement convert tool for WPT tests, thanks.

BruceDai · 2021-09-16T09:49:49Z

Update.

We've drafted some tests for later WPT tests contribution, including:

IDL tests
Implemented idlharness.https.any.js which could generate tests for Web IDL fragments using the JavaScript Tests (testharness.js) infrastructure. Now there're each 351 tests with 216 pass tests on latest Chrome stable (93.0.4577.82).
JavaScript tests (testharness.js)
Implemented these tests by converting from 328 op tests and 60 model tests of webnn-polyfill tests. Now there're 388 pass tests on latest Chrome stable (93.0.4577.82).

Above tests are with webnn.idl of w3c/webref which automatically extracted from https://www.w3.org/TR/webnn/ and latest unminimized webnn-poyfill.js which was generated by building this webnn-polyfill project.

I'm not sure that Web ML community (Working Group) would be interested in such tests or whether I'm doing the right thing?

Here's a preview on https://brucedai.github.io/wpt/webnn/index.html, PTAL, thanks.

@anssiko @wchao1115 @huningxin Would you please give me some advice and some pieces of guidance? Thanks.

Notes: It would take time to load tests since unminimized webnn-poyfill.js script and weights files of model tests are large size files.

huningxin · 2021-09-29T06:49:43Z

Thanks much @BruceDai . The preview looks nice. As @anssiko mentioned, the WPT is a request for standardization phases (where we are now). So your development is quite helpful. I am interested in the status and the path forward.

I think this is a good topic for WebML WG meeting and probably for TPAC.

@anssiko @wchao1115 and @dontcallmedom , WDYT?

anssiko · 2021-09-30T15:15:55Z

Thanks @BruceDai for your efforts in converting the tests into w-p-t!

@dontcallmedom has expertise in w-p-t and can probably address your questions.

We've added the conformance testing of WebNN API to the TPAC agenda and will discuss this area more broadly at that meeting.

dontcallmedom · 2021-09-30T15:22:30Z

@BruceDai the ultimate goal should be to submit the converted tests as a pull request to https://github.com/web-platform-tests/wpt in a newly created /webnn directory - presumably at that point, the polyfill itself would import them from there rather than have the tests duplicated here.

Let me know what specific guidance you might want to move forward in that direction

BruceDai · 2021-10-09T10:48:50Z

Thanks @dontcallmedom @anssiko and @huningxin. And I'm sorry for late replay because of the holiday.

presumably at that point, the polyfill itself would import them from there rather than have the tests duplicated here.

@dontcallmedom Yes, I quite agree with you.

I will submit a PR of IDL tests for WebNN API using webnn-poyfill.js to WPT firstly.

wchao1115 · 2021-10-10T00:43:38Z

Thanks @BruceDai

Aside from the choice of the tool to use, the test methodology is also important. The fundamental thing about floating-point math is that there is an inherent computational error built into the process and so the result could vary from operation to operation, or even from machine to machine. Typically for ML, we test the compute results by comparing them to a certain baseline values, and if the result is "close enough" to the baseline, the comparison is considered passed. This is how the epsilon value (or the so-called tolerance value) of a function like assert_approx_equal could be used.

An important topic then becomes: so what tolerance values to use in the comparisons? Given that the computational error is accumulative, there is no single answer. For example, you could normally use a smaller tolerance value for element-wise mul but would need a bigger value for element-wise sqrt just because of the greater accumulation of errors of the additional complexity, and even bigger value for a highly complex one such as convolution and gemm. So, the answer is that it depends. An empirical estimation of a tolerance value to use can also be based on a real experimentation on a real hardware.

For DirectML, we chose the double-precision results from a standard CPU as our baseline values or the so-called ideal result. We then define tolerance values from the ideal results for each operation we test. The comparison itself, however, is not done in absolute values, but rather in term of ULPs.

The ULP or the unit of least precision is an important concept in floating point math. It is the distance between two consecutive floating-point values. When you define the tolerance as in the units of ULP, you no longer compare a floating-point value to another, rather you measure the distance between the two representations and see if it's greater than the minimal allowable distance.

Here is a sample implementation of such a comparison method. The same algorithm can also be extended for FP16 and even FP64.

template<typename T>
int64_t GetBitwise(T value) {
    int64_t bitwiseValue = (value < T(0)) ? ~int64_t(0) : 0; // Extend sign.
    *std::launder(reinterpret_cast<T*>(&bitwiseValue)) = value;
    return bitwiseValue;
}

bool CompareUlp(float a, float b, uint64_t ulp) {
    return static_cast<uint64_t>(abs(GetBitwise(a) - GetBitwise(b))) > ulp;
}

A key benefit of ULP-based tolerances is that it's hardware agnostic. Some compute hardware emulates their floating-point result from fixed-point instructions; the ULP comparison would still work because the comparison will still be relative to its native representation. There is also an opportunity to standardize a set of ULP tolerances across different operations.

wchao1115 · 2021-10-10T04:17:07Z

w3c/machine-learning-workshop#80

BruceDai · 2021-11-02T06:41:07Z

Thanks @wchao1115.
We're following ULP-based comparison for tests. About your comparison method,

bool CompareUlp(float a, float b, uint64_t ulp) {
    return static_cast<uint64_t>(abs(GetBitwise(a) - GetBitwise(b))) > ulp;
}

I have two following questions:

Is it a typo error of relational operator in return expression, should > be < here?
Does argument ulp actually mean the product of factor and ulp of different precision?

@wchao1115 PTAL, thanks.

wchao1115 · 2021-11-02T17:00:25Z

'>' is correct. The function return true if the distance between a and b is greater than the specified ulp value. The ulp value just means the acceptable distance between 2 floating point values in the units of ULP.

BruceDai · 2021-11-03T01:26:05Z

@wchao1115 Thank you for your explanation.

BruceDai · 2021-11-05T14:25:04Z

Hi @wchao1115,

I have a sample as following, you could see that I set ulp be 65536, that is, the epsilon is 0.0009765625 , then the return value of CompareUlp(a, b, ulp) is false and the one of CompareUlp(a1, b, ulp) is true.

So are these statements right? Please correct me if I am wrong, thanks.

Since the distance between a and b is smaller than the specified ulp value, so a is approximately equal to b.
Since the distance between a1 and b is greater than the specified ulp value, so a1 isn't approximately equal to b.

  const float a = 0.14768f;
  const float a1 = 0.14888f;
  const float b = 0.14775f; // baseline
  const float top = b + 0.0009765625;
  const float bottom = b - 0.0009765625;
  const uint64_t ulp = 65536; // acceptable distance
  std::cout << abs(a - b) << std::endl;      // 7.00057e-05
  std::cout << abs(a1 - b) << std::endl;     // 0.00113
  std::cout << GetBitwise(b) << std::endl;   // 1041714119
  std::cout << GetBitwise(a) << std::endl;   // 1041709421
  std::cout << GetBitwise(a1) << std::endl;  // 1041789952
  std::cout << static_cast<uint64_t>(abs(GetBitwise(a) - GetBitwise(b))) << std::endl;      // 4698
  std::cout << static_cast<uint64_t>(abs(GetBitwise(a1) - GetBitwise(b))) << std::endl;     // 75833
  std::cout << static_cast<uint64_t>(abs(GetBitwise(top) - GetBitwise(b))) << std::endl;    // 65536
  std::cout << static_cast<uint64_t>(abs(GetBitwise(bottom) - GetBitwise(b))) << std::endl; // 65536
  std::cout << CompareUlp(a, b, ulp) << std::endl;   // false
  std::cout << CompareUlp(a1, b, ulp) << std::endl; // true

wchao1115 · 2021-11-10T06:51:51Z

Your statements are correct. I must add that in practice the ulp value is going to be much smaller than what you use here, and that it also varies depending on the computational complexity of the operations whose results are being compared i.e. the ulp value of an element-wise mul is going to be narrower than element-wise sqrt.

BruceDai · 2021-11-10T07:07:40Z

Thanks @wchao1115

Since different baseline value has different acceptance distance, do you have any suggestion for it, such as unified acceptance distance compute formula regarding with baseline? Thanks.

huningxin mentioned this issue Sep 9, 2020

Add the foundation implementation #1

Merged

BruceDai mentioned this issue Nov 23, 2020

Add compatibility tests generated from Android Neural Networks API (NNAPI) Compatibility Test Suite. #29

Merged

huningxin pushed a commit that referenced this issue Jan 11, 2021

Merge pull request #2 from huningxin/converted_nnapi-cts

1fedc40

Use node es6 modules support

mingmingtasd pushed a commit to mingmingtasd/webnn-polyfill that referenced this issue Mar 10, 2021

Merge pull request webmachinelearning#2 from huningxin/fixup_rename

4a65ae4

Fixup rename

BruceDai self-assigned this Sep 27, 2021

huningxin mentioned this issue Sep 30, 2021

WebML WG Virtual Meetings at TPAC 2021 webmachinelearning/meetings#18

Closed

BruceDai mentioned this issue Nov 10, 2021

Use ULP based comparison. #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse mocha tests for web platform tests #2

Reuse mocha tests for web platform tests #2

huningxin commented Sep 9, 2020

BruceDai commented Sep 14, 2020

anssiko commented Sep 15, 2020

BruceDai commented Jul 6, 2021

BruceDai commented Sep 16, 2021 •

edited

huningxin commented Sep 29, 2021

anssiko commented Sep 30, 2021

dontcallmedom commented Sep 30, 2021

BruceDai commented Oct 9, 2021

wchao1115 commented Oct 10, 2021 •

edited

wchao1115 commented Oct 10, 2021

BruceDai commented Nov 2, 2021

wchao1115 commented Nov 2, 2021

BruceDai commented Nov 3, 2021

BruceDai commented Nov 5, 2021

wchao1115 commented Nov 10, 2021 •

edited

BruceDai commented Nov 10, 2021

Reuse mocha tests for web platform tests #2

Reuse mocha tests for web platform tests #2

Comments

huningxin commented Sep 9, 2020

BruceDai commented Sep 14, 2020

anssiko commented Sep 15, 2020

BruceDai commented Jul 6, 2021

BruceDai commented Sep 16, 2021 • edited

huningxin commented Sep 29, 2021

anssiko commented Sep 30, 2021

dontcallmedom commented Sep 30, 2021

BruceDai commented Oct 9, 2021

wchao1115 commented Oct 10, 2021 • edited

wchao1115 commented Oct 10, 2021

BruceDai commented Nov 2, 2021

wchao1115 commented Nov 2, 2021

BruceDai commented Nov 3, 2021

BruceDai commented Nov 5, 2021

wchao1115 commented Nov 10, 2021 • edited

BruceDai commented Nov 10, 2021

BruceDai commented Sep 16, 2021 •

edited

wchao1115 commented Oct 10, 2021 •

edited

wchao1115 commented Nov 10, 2021 •

edited