Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove html_entities dependency and use built-in entities module #434

Merged
merged 6 commits into from Nov 3, 2022

Conversation

philss
Copy link
Owner

@philss philss commented Nov 3, 2022

Since most of the code was already present due to the HTML tokenizer, we can reuse that and remove the dependency.

This also makes both encoding and decoding slightly faster and with less memory usage.

Benchmarks are in the details section.

Operating System: Linux CPU Information: AMD Ryzen 9 5950X 16-Core Processor Number of Available Cores: 32 Available memory: 31.24 GB Elixir 1.14.0 Erlang 25.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: big, medium, small
Estimated total run time: 1.40 min

Benchmarking Floki.Entities.encode/1 with input big ...
Benchmarking Floki.Entities.encode/1 with input medium ...
Benchmarking Floki.Entities.encode/1 with input small ...
Benchmarking String.replace/3 with input big ...
Benchmarking String.replace/3 with input medium ...
Benchmarking String.replace/3 with input small ...

With input big

Name ips average deviation median 99th %
String.replace/3 57.11 17.51 ms ±8.91% 17.45 ms 21.78 ms
Floki.Entities.encode/1 42.48 23.54 ms ±2.02% 23.48 ms 24.79 ms

Comparison:
String.replace/3 57.11
Floki.Entities.encode/1 42.48 - 1.34x slower +6.03 ms

Memory usage statistics:

Name Memory usage
String.replace/3 7.71 MB
Floki.Entities.encode/1 23.09 MB - 2.99x memory usage +15.38 MB

All measurements for memory usage were the same

With input medium

Name ips average deviation median 99th %
String.replace/3 182.24 5.49 ms ±7.38% 5.53 ms 6.29 ms
Floki.Entities.encode/1 130.31 7.67 ms ±1.55% 7.67 ms 8.02 ms

Comparison:
String.replace/3 182.24
Floki.Entities.encode/1 130.31 - 1.40x slower +2.19 ms

Memory usage statistics:

Name Memory usage
String.replace/3 2.80 MB
Floki.Entities.encode/1 7.68 MB - 2.74x memory usage +4.88 MB

All measurements for memory usage were the same

With input small

Name ips average deviation median 99th %
String.replace/3 839.76 1.19 ms ±15.04% 1.15 ms 1.70 ms
Floki.Entities.encode/1 395.46 2.53 ms ±15.48% 2.39 ms 3.47 ms

Comparison:
String.replace/3 839.76
Floki.Entities.encode/1 395.46 - 2.12x slower +1.34 ms

Memory usage statistics:

Name Memory usage
String.replace/3 0.63 MB
Floki.Entities.encode/1 2.20 MB - 3.47x memory usage +1.56 MB

All measurements for memory usage were the same


Decoding below

With input —

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.32 M 300.89 ns ±2405.34% 271 ns 531 ns
HtmlEntities.decode/1 3.25 M 308.15 ns ±6311.72% 221 ns 832 ns

Comparison:
Floki.Entities.decode/1 3.32 M
HtmlEntities.decode/1 3.25 M - 1.02x slower +7.26 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 208 B - 1.18x memory usage +32 B

All measurements for memory usage were the same

With input 𝕒

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.23 M 309.72 ns ±2531.98% 280 ns 521 ns
HtmlEntities.decode/1 2.23 M 447.61 ns ±7545.19% 301 ns 601 ns

Comparison:
Floki.Entities.decode/1 3.23 M
HtmlEntities.decode/1 2.23 M - 1.45x slower +137.89 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 328 B - 1.86x memory usage +152 B

All measurements for memory usage were the same

With input ≎

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.24 M 308.68 ns ±2420.87% 280 ns 531 ns
HtmlEntities.decode/1 2.81 M 355.64 ns ±4513.98% 291 ns 611 ns

Comparison:
Floki.Entities.decode/1 3.24 M
HtmlEntities.decode/1 2.81 M - 1.15x slower +46.96 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 408 B - 2.32x memory usage +232 B

All measurements for memory usage were the same

With input ⪡̸

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.23 M 309.23 ns ±2440.85% 280 ns 531 ns
HtmlEntities.decode/1 1.55 M 644.61 ns ±4377.60% 491 ns 852 ns

Comparison:
Floki.Entities.decode/1 3.23 M
HtmlEntities.decode/1 1.55 M - 2.08x slower +335.38 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 848 B - 4.82x memory usage +672 B

All measurements for memory usage were the same

With input ∼

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.30 M 302.98 ns ±2522.47% 271 ns 511 ns
HtmlEntities.decode/1 2.80 M 357.08 ns ±6568.64% 240 ns 551 ns

Comparison:
Floki.Entities.decode/1 3.30 M
HtmlEntities.decode/1 2.80 M - 1.18x slower +54.10 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 368 B - 2.09x memory usage +192 B

All measurements for memory usage were the same

With input ≲

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.24 M 308.95 ns ±2618.17% 280 ns 511 ns
HtmlEntities.decode/1 2.25 M 445.31 ns ±7182.95% 301 ns 631 ns

Comparison:
Floki.Entities.decode/1 3.24 M
HtmlEntities.decode/1 2.25 M - 1.44x slower +136.36 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 328 B - 1.86x memory usage +152 B

All measurements for memory usage were the same

With input Ō

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.45 M 289.92 ns ±2371.12% 261 ns 501 ns
HtmlEntities.decode/1 2.55 M 391.78 ns ±6546.23% 280 ns 612 ns

Comparison:
Floki.Entities.decode/1 3.45 M
HtmlEntities.decode/1 2.55 M - 1.35x slower +101.86 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 368 B - 2.09x memory usage +192 B

All measurements for memory usage were the same

With input ⤙

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.26 M 306.44 ns ±2322.00% 280 ns 521 ns
HtmlEntities.decode/1 2.84 M 352.53 ns ±5562.04% 290 ns 592 ns

Comparison:
Floki.Entities.decode/1 3.26 M
HtmlEntities.decode/1 2.84 M - 1.15x slower +46.09 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 408 B - 2.32x memory usage +232 B

All measurements for memory usage were the same

With input ∈

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.21 M 311.91 ns ±2040.92% 281 ns 521 ns
HtmlEntities.decode/1 2.23 M 448.55 ns ±6086.67% 340 ns 681 ns

Comparison:
Floki.Entities.decode/1 3.21 M
HtmlEntities.decode/1 2.23 M - 1.44x slower +136.64 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 368 B - 2.09x memory usage +192 B

All measurements for memory usage were the same

With input ⋁

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.29 M 304.08 ns ±2613.94% 280 ns 511 ns
HtmlEntities.decode/1 2.65 M 377.69 ns ±8276.08% 251 ns 601 ns

Comparison:
Floki.Entities.decode/1 3.29 M
HtmlEntities.decode/1 2.65 M - 1.24x slower +73.61 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 288 B - 1.64x memory usage +112 B

All measurements for memory usage were the same

With input 𝓅

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.27 M 306.09 ns ±2506.43% 280 ns 511 ns
HtmlEntities.decode/1 2.34 M 426.49 ns ±7514.06% 271 ns 611 ns

Comparison:
Floki.Entities.decode/1 3.27 M
HtmlEntities.decode/1 2.34 M - 1.39x slower +120.40 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 328 B - 1.86x memory usage +152 B

All measurements for memory usage were the same

With input ⊐

Name ips average deviation median 99th %
Floki.Entities.decode/1 2.97 M 336.40 ns ±7376.81% 280 ns 521 ns
HtmlEntities.decode/1 2.47 M 405.26 ns ±6187.42% 310 ns 611 ns

Comparison:
Floki.Entities.decode/1 2.97 M
HtmlEntities.decode/1 2.47 M - 1.20x slower +68.86 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 488 B - 2.77x memory usage +312 B

All measurements for memory usage were the same

With input ⋄

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.12 M 320.19 ns ±6904.46% 271 ns 501 ns
HtmlEntities.decode/1 3.03 M 330.56 ns ±5781.01% 241 ns 561 ns

Comparison:
Floki.Entities.decode/1 3.12 M
HtmlEntities.decode/1 3.03 M - 1.03x slower +10.37 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 176 B
HtmlEntities.decode/1 448 B - 2.55x memory usage +272 B

All measurements for memory usage were the same

With input ‫

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.55 M 282.05 ns ±11560.17% 231 ns 411 ns
HtmlEntities.decode/1 2.29 M 436.62 ns ±5691.93% 331 ns 611 ns

Comparison:
Floki.Entities.decode/1 3.55 M
HtmlEntities.decode/1 2.29 M - 1.55x slower +154.57 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 320 B
HtmlEntities.decode/1 456 B - 1.43x memory usage +136 B

All measurements for memory usage were the same

With input ر

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.07 M 245.82 ns ±7818.56% 201 ns 400 ns
HtmlEntities.decode/1 2.67 M 374.54 ns ±7490.65% 270 ns 521 ns

Comparison:
Floki.Entities.decode/1 4.07 M
HtmlEntities.decode/1 2.67 M - 1.52x slower +128.72 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input ق

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.07 M 245.83 ns ±7820.50% 191 ns 391 ns
HtmlEntities.decode/1 2.57 M 388.39 ns ±7261.06% 280 ns 561 ns

Comparison:
Floki.Entities.decode/1 4.07 M
HtmlEntities.decode/1 2.57 M - 1.58x slower +142.56 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input م

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.05 M 247.13 ns ±8069.65% 191 ns 391 ns
HtmlEntities.decode/1 2.36 M 423.65 ns ±7422.55% 301 ns 622 ns

Comparison:
Floki.Entities.decode/1 4.05 M
HtmlEntities.decode/1 2.36 M - 1.71x slower +176.52 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input ا

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.07 M 245.71 ns ±7728.22% 191 ns 391 ns
HtmlEntities.decode/1 2.01 M 497.38 ns ±7149.31% 360 ns 692 ns

Comparison:
Floki.Entities.decode/1 4.07 M
HtmlEntities.decode/1 2.01 M - 2.02x slower +251.67 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input ل

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.11 M 243.35 ns ±8042.63% 191 ns 400 ns
HtmlEntities.decode/1 2.41 M 414.67 ns ±7131.90% 300 ns 621 ns

Comparison:
Floki.Entities.decode/1 4.11 M
HtmlEntities.decode/1 2.41 M - 1.70x slower +171.31 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input ه

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.08 M 245.15 ns ±7852.96% 191 ns 381 ns
HtmlEntities.decode/1 2.12 M 471.42 ns ±7213.75% 330 ns 661 ns

Comparison:
Floki.Entities.decode/1 4.08 M
HtmlEntities.decode/1 2.12 M - 1.92x slower +226.27 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input ت

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.07 M 245.81 ns ±7827.84% 200 ns 371 ns
HtmlEntities.decode/1 2.06 M 485.96 ns ±6664.08% 351 ns 691 ns

Comparison:
Floki.Entities.decode/1 4.07 M
HtmlEntities.decode/1 2.06 M - 1.98x slower +240.16 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input ف

Name ips average deviation median 99th %
Floki.Entities.decode/1 4.08 M 245.22 ns ±7882.41% 191 ns 381 ns
HtmlEntities.decode/1 2.01 M 498.21 ns ±6553.74% 370 ns 701 ns

Comparison:
Floki.Entities.decode/1 4.08 M
HtmlEntities.decode/1 2.01 M - 2.03x slower +252.99 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 344 B
HtmlEntities.decode/1 432 B - 1.26x memory usage +88 B

All measurements for memory usage were the same

With input 2

Name ips average deviation median 99th %
Floki.Entities.decode/1 10.70 M 93.44 ns ±10796.54% 70 ns 161 ns
HtmlEntities.decode/1 3.45 M 289.76 ns ±8206.87% 200 ns 471 ns

Comparison:
Floki.Entities.decode/1 10.70 M
HtmlEntities.decode/1 3.45 M - 3.10x slower +196.32 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 40 B
HtmlEntities.decode/1 128 B - 3.20x memory usage +88 B

All measurements for memory usage were the same

With input ‮

Name ips average deviation median 99th %
Floki.Entities.decode/1 3.88 M 257.62 ns ±11750.96% 181 ns 401 ns
HtmlEntities.decode/1 1.96 M 510.24 ns ±5627.14% 390 ns 741 ns

Comparison:
Floki.Entities.decode/1 3.88 M
HtmlEntities.decode/1 1.96 M - 1.98x slower +252.61 ns

Memory usage statistics:

Name Memory usage
Floki.Entities.decode/1 320 B
HtmlEntities.decode/1 456 B - 1.43x memory usage +136 B

All measurements for memory usage were the same

benchmarks.zip

@philss philss merged commit 48168b3 into main Nov 3, 2022
@philss philss deleted the ps-remove-html-entities-dep branch November 3, 2022 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant