Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up varcom support #253

Closed
epage opened this issue May 19, 2021 · 1 comment · Fixed by #254
Closed

Speed up varcom support #253

epage opened this issue May 19, 2021 · 1 comment · Fixed by #254

Comments

@epage
Copy link
Collaborator

epage commented May 19, 2021

When there are no hits, its only about a 10% cost. When there are hits, it can go up to 50%

For "code"

check_file/FoundFiles/code
                        time:   [23.589 us 23.753 us 23.933 us]
                        thrpt:  [12.114 MiB/s 12.205 MiB/s 12.290 MiB/s]
check_file/Identifiers/code
                        time:   [26.211 us 26.350 us 26.498 us]
                        thrpt:  [10.941 MiB/s 11.003 MiB/s 11.061 MiB/s]
check_file/Words/code   time:   [28.795 us 28.913 us 29.046 us]
                        thrpt:  [9.9814 MiB/s 10.027 MiB/s 10.068 MiB/s]
check_file/Typos/code   time:   [32.651 us 32.788 us 32.934 us]
                        thrpt:  [8.8029 MiB/s 8.8421 MiB/s 8.8794 MiB/s]

and with varcon

check_file/Typos/code   time:   [35.860 us 36.021 us 36.187 us]
                        thrpt:  [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
                 change:
                        time:   [+8.6784% +9.7190% +10.748%] (p = 0.00 < 0.05)
                        thrpt:  [-9.7049% -8.8581% -7.9854%]
                        Performance has regressed.

For "corpus" (high token count compared to non-tokens, lots of corrections)

check_file/FoundFiles/corpus
                        time:   [53.392 us 53.820 us 54.261 us]
                        thrpt:  [10.464 GiB/s 10.550 GiB/s 10.635 GiB/s]
check_file/Identifiers/corpus
                        time:   [2.5148 ms 2.5232 ms 2.5327 ms]
                        thrpt:  [229.57 MiB/s 230.44 MiB/s 231.21 MiB/s]
check_file/Words/corpus time:   [6.5589 ms 6.5755 ms 6.5940 ms]
                        thrpt:  [88.177 MiB/s 88.425 MiB/s 88.649 MiB/s]
check_file/Typos/corpus time:   [17.806 ms 17.900 ms 18.008 ms]
                        thrpt:  [32.288 MiB/s 32.482 MiB/s 32.654 MiB/s]

and with varcon

check_file/Typos/corpus time:   [26.966 ms 27.215 ms 27.521 ms]
                        thrpt:  [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
                 change:
                        time:   [+50.409% +52.035% +53.975%] (p = 0.00 < 0.05)
                        thrpt:  [-35.054% -34.226% -33.515%]
                        Performance has regressed.
@epage
Copy link
Collaborator Author

epage commented May 19, 2021

I wonder if there is cost in doing two different hashings and lookups.

We might be able to speed this up by merging the two dictionaries.

epage pushed a commit to epage/typos that referenced this issue May 19, 2021
Variant support slows us down by 10-50$.  I assume most people will run
with `en` and so most of this overhead is to waste.  So instead of
merging vars with dict, let's instead get a quick win by just skipping
vars when we don't need to.  If the assumptions behind this change over
time or if there is need for speeding up a specific locale, we can
re-address this.

Before:
```
check_file/Typos/code   time:   [35.860 us 36.021 us 36.187 us]
                        thrpt:  [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
check_file/Typos/corpus time:   [26.966 ms 27.215 ms 27.521 ms]
                        thrpt:  [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
```
After:
```
check_file/Typos/code   time:   [33.837 us 33.928 us 34.031 us]
                        thrpt:  [8.5191 MiB/s 8.5452 MiB/s 8.5680 MiB/s]
check_file/Typos/corpus time:   [17.521 ms 17.620 ms 17.730 ms]
                        thrpt:  [32.794 MiB/s 32.999 MiB/s 33.184 MiB/s]
```

This puts us inline with `--no-default-features --features dict`

Fixes crate-ci#253
epage pushed a commit to epage/typos that referenced this issue May 19, 2021
Variant support slows us down by 10-50$.  I assume most people will run
with `en` and so most of this overhead is to waste.  So instead of
merging vars with dict, let's instead get a quick win by just skipping
vars when we don't need to.  If the assumptions behind this change over
time or if there is need for speeding up a specific locale, we can
re-address this.

Before:
```
check_file/Typos/code   time:   [35.860 us 36.021 us 36.187 us]
                        thrpt:  [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
check_file/Typos/corpus time:   [26.966 ms 27.215 ms 27.521 ms]
                        thrpt:  [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
```
After:
```
check_file/Typos/code   time:   [33.837 us 33.928 us 34.031 us]
                        thrpt:  [8.5191 MiB/s 8.5452 MiB/s 8.5680 MiB/s]
check_file/Typos/corpus time:   [17.521 ms 17.620 ms 17.730 ms]
                        thrpt:  [32.794 MiB/s 32.999 MiB/s 33.184 MiB/s]
```

This puts us inline with `--no-default-features --features dict`

Fixes crate-ci#253
epage pushed a commit to epage/typos that referenced this issue May 19, 2021
Variant support slows us down by 10-50$.  I assume most people will run
with `en` and so most of this overhead is to waste.  So instead of
merging vars with dict, let's instead get a quick win by just skipping
vars when we don't need to.  If the assumptions behind this change over
time or if there is need for speeding up a specific locale, we can
re-address this.

Before:
```
check_file/Typos/code   time:   [35.860 us 36.021 us 36.187 us]
                        thrpt:  [8.0117 MiB/s 8.0486 MiB/s 8.0846 MiB/s]
check_file/Typos/corpus time:   [26.966 ms 27.215 ms 27.521 ms]
                        thrpt:  [21.127 MiB/s 21.365 MiB/s 21.562 MiB/s]
```
After:
```
check_file/Typos/code   time:   [33.837 us 33.928 us 34.031 us]
                        thrpt:  [8.5191 MiB/s 8.5452 MiB/s 8.5680 MiB/s]
check_file/Typos/corpus time:   [17.521 ms 17.620 ms 17.730 ms]
                        thrpt:  [32.794 MiB/s 32.999 MiB/s 33.184 MiB/s]
```

This puts us inline with `--no-default-features --features dict`

Fixes crate-ci#253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant