Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Latency] DefaultLookupDict too slow #1300

Open
hannw opened this issue Aug 16, 2020 · 4 comments
Open

[Latency] DefaultLookupDict too slow #1300

hannw opened this issue Aug 16, 2020 · 4 comments
Labels
performance Performance issues v0.x

Comments

@hannw
Copy link

hannw commented Aug 16, 2020

Description

The __getitem__ method of DefaultLookupDict is too slow. Profiling on a p3.16xlarge machine on AWS shows that each __getitem__ method costs 2.4 micro second, whereas a regular dictionary .get(key, defaultvalue) method on the same machine is around 120 nano second, so the current implementation is 20 times slower than the regular dict get operation. This operation along is taking about 50% of all processing time in our data pipeline. Is there a way to speed this up?

Error Message

N/A

To Reproduce

Use Vocab to numericalize strings as usual.

@hannw hannw added the bug Something isn't working label Aug 16, 2020
@sxjscience
Copy link
Member

@hannw Would you provide some profiling scripts that arrive at these numbers? In the new version, we use the default dictionary for storing the mapping:

self._token_to_idx = dict()
. Also, we are doing a series of benchmarks to analyze the speed of GluonNLP: https://github.com/dmlc/gluon-nlp/tree/master/scripts/benchmarks. If you can provide some profiling scripts, it will be super helpful for us to speed up GluonNLP.

@sxjscience sxjscience added performance Performance issues and removed bug Something isn't working labels Aug 16, 2020
@hannw
Copy link
Author

hannw commented Aug 16, 2020

@sxjscience for profiling, we use the default python cProfiler while running the training script and snakeviz to visualize the breakdown.

We are currently using 0.8.x of gluon-nlp, so maybe that's why the get is slow. Let us test the bleeding edge and see if it speed things up.

@szha
Copy link
Member

szha commented Aug 16, 2020

@hannw thanks for the interest. note that the master branch is now used for numpy-compatible version of gluonnlp (#1298) which relies on mxnet 2.0 nightly builds (available to developers at https://dist.mxnet.io/python).

@szha szha added the v0.x label Aug 16, 2020
@szha
Copy link
Member

szha commented Sep 17, 2020

@hannw I think the above comparison is not apple to apple in our supported use cases. The Vocab class is designed to handle both the cases on whether unknown token is set. If the unknown token is not set, then it should throw an error according to the definition of the class.

If you know beforehand that you always have unknown token, a good option may be to directly use the built-in dictionary instead of Vocab.

I did a comparison among the vocab class on 0.x, 0.8.3, and the one on 1.x and I don't see a significant difference, most likely due to the fact that there's a condition check in the new logic on whether unknown token is set. I see mixed performance for two implementation in the cases of w/ and w/o unknown_token:

# Tests done on python3.7 OSX 10.15.6
# GluonNLP 0.x
# w/ unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']})
%timeit v['a']
530 ns ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit v['abc']
565 ns ± 24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
3.32 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
2.67 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# w/o unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']}, unknown_token=None)
%timeit v['a']
362 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
1.17 µs ± 28.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
582 µs ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# GluonNLP 0.8.3
# w/ unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']})
%timeit v['a']
530 ns ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit v['abc']
550 ns ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
3.2 µs ± 166 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
2.65 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# w/o unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']}, unknown_token=None)
%timeit v['a']
362 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
1.13 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
618 µs ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# GluonNLP master
# w/ unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']})
%timeit v['a']
598 ns ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit v['abc']
646 ns ± 34.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
2.34 µs ± 200 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
1.34 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# w/o unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']}, unknown_token=None)
%timeit v['a']
641 ns ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
2.37 µs ± 89.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
1.34 ms ± 27.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance Performance issues v0.x
Projects
None yet
Development

No branches or pull requests

3 participants