Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim the fat #31

Open
parasyte opened this issue Jan 25, 2022 · 0 comments
Open

Trim the fat #31

parasyte opened this issue Jan 25, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@parasyte
Copy link
Owner

parasyte commented Jan 25, 2022

0.3.0 was released today, and I was surprised by the zip file size:

File name 0.2.0 0.3.0
i686-pc-windows-msvc 1.74 MB 3.36 MB
x86_64-apple-darwin 1.7 MB 3.41 MB
x86_64-pc-windows-msvc 1.74 MB 3.36 MB

Clearly something has changed!

Note that these are compressed sizes reported on the GitHub releases page. The rest of this analysis will focus on uncompressed executable sizes. For reference, the size of the uncompressed x86_64 Windows exe is 4,410,880 bytes for 0.2.0, and 7,805,952 bytes for 0.3.0.


My first suspect was ureq, added in #11. This crate handles HTTP requests. I ran some high-level analysis with cargo-bloat and was able to verify that #11 increased the executable size by about 1.3 MB and added about 800 functions (after optimizations including inlining and dead code elimination). None of these new functions show up in the top-20 list provided by cargo-bloat, meaning they are all less than 23 KB.

The dependency tree for ureq points to some things that can likely be removed or replaced with a smaller alternative:

ureq v2.3.0
├── base64 v0.13.0
├── chunked_transfer v1.4.0
├── log v0.4.14
│   └── cfg-if v1.0.0
├── once_cell v1.8.0
├── rustls v0.20.0
│   ├── log v0.4.14 (*)
│   ├── ring v0.16.20
│   │   ├── spin v0.5.2
│   │   ├── untrusted v0.7.1
│   │   └── winapi v0.3.9
│   ├── sct v0.7.0
│   │   ├── ring v0.16.20 (*)
│   │   └── untrusted v0.7.1
│   └── webpki v0.22.0
│       ├── ring v0.16.20 (*)
│       └── untrusted v0.7.1
├── serde v1.0.130
│   └── serde_derive v1.0.130 (proc-macro)
│       ├── proc-macro2 v1.0.32
│       │   └── unicode-xid v0.2.2
│       ├── quote v1.0.10
│       │   └── proc-macro2 v1.0.32 (*)
│       └── syn v1.0.81
│           ├── proc-macro2 v1.0.32 (*)
│           ├── quote v1.0.10 (*)
│           └── unicode-xid v0.2.2
├── serde_json v1.0.69
│   ├── itoa v0.4.8
│   ├── ryu v1.0.5
│   └── serde v1.0.130 (*)
├── url v2.2.2
│   ├── form_urlencoded v1.0.1
│   │   ├── matches v0.1.9
│   │   └── percent-encoding v2.1.0
│   ├── idna v0.2.3
│   │   ├── matches v0.1.9
│   │   ├── unicode-bidi v0.3.7
│   │   └── unicode-normalization v0.1.19
│   │       └── tinyvec v1.5.0
│   │           └── tinyvec_macros v0.1.0
│   ├── matches v0.1.9
│   └── percent-encoding v2.1.0
├── webpki v0.22.0 (*)
└── webpki-roots v0.22.1
    └── webpki v0.22.0 (*)

Number of dependencies is not a great proxy for contribution to executable size, but rustls, serde, and url looks like the heaviest transitive dependencies. We need TLS, but ureq can use either rustls or native-tls under the hood, so we might get a smaller exe just by switching? That's one theory to test.

On the other hand, serde_json is not necessary at all! I use it for conveniently parsing the JSON response from the GitHub API. But JSON parsing can be done with other crates like the aptly-named json, which has zero additional dependencies (beyond std): https://crates.io/crates/json/0.12.4/dependencies Switching JSON parsing to the json crate may or may not reduce overall exe size. That's another theory to test. (Aside: serde is used by other crates, so getting rid of all of it is highly unlikely.)

We need to handle URLs, so I guess the url crate is strictly necessary. For my particular use-case, I don't need to care about IDNA at all. Maybe there is something I could do to trick ureq into not calling into the url crate, and LTO will trim all the useless IDNA stuff. That's theory three.

But that doesn't account for all 3,395,072 bytes added between the two releases.


My second suspicion was something added when updating dependencies in #28. cargo-bloat pointed out an additional 1.8 MB from about 550 new functions added by this specific PR. Combined, that's roughly 3.1 MB added to the exe size from two PRs. That falls into the ballpark.

This time, there are two differences of note in the dependency tree. The first is rfd (which handles dialog windows).

Used in 0.2.0:

rfd v0.5.1
├── raw-window-handle v0.3.3
│   └── libc v0.2.107
└── winapi v0.3.9

Used in 0.3.0:

rfd v0.6.4
├── raw-window-handle v0.4.2
│   └── cty v0.2.2
└── windows v0.30.0
    └── windows_x86_64_msvc v0.30.0

Could windows-rs be responsible? It doesn't seem like it, since that crate is almost entirely a flat hierarchy of leaf functions. In my experience, these kinds of bloat issues tend to be caused by deeply nested functions with code paths that will never run: rust-console/cargo-n64@8cd9123

The next suspect was ... ureq again! A minor update added support for gzip, which includes a small tree of its own with `flate2:

ureq v2.4.0
ureq v2.4.0
├── base64 v0.13.0
├── chunked_transfer v1.4.0
├── flate2 v1.0.22
│   ├── cfg-if v1.0.0
│   ├── crc32fast v1.3.1
│   │   └── cfg-if v1.0.0
│   ├── libc v0.2.113
│   └── miniz_oxide v0.4.4
│       └── adler v1.0.2
├── log v0.4.14
│   └── cfg-if v1.0.0
├── once_cell v1.9.0
├── rustls v0.20.2
│   ├── log v0.4.14 (*)
│   ├── ring v0.16.20
│   │   ├── spin v0.5.2
│   │   ├── untrusted v0.7.1
│   │   └── winapi v0.3.9
│   ├── sct v0.7.0
│   │   ├── ring v0.16.20 (*)
│   │   └── untrusted v0.7.1
│   └── webpki v0.22.0
│       ├── ring v0.16.20 (*)
│       └── untrusted v0.7.1
├── serde v1.0.135
│   └── serde_derive v1.0.135 (proc-macro)
│       ├── proc-macro2 v1.0.36
│       │   └── unicode-xid v0.2.2
│       ├── quote v1.0.15
│       │   └── proc-macro2 v1.0.36 (*)
│       └── syn v1.0.86
│           ├── proc-macro2 v1.0.36 (*)
│           ├── quote v1.0.15 (*)
│           └── unicode-xid v0.2.2
├── serde_json v1.0.78
│   ├── itoa v1.0.1
│   ├── ryu v1.0.9
│   └── serde v1.0.135 (*)
├── url v2.2.2
│   ├── form_urlencoded v1.0.1
│   │   ├── matches v0.1.9
│   │   └── percent-encoding v2.1.0
│   ├── idna v0.2.3
│   │   ├── matches v0.1.9
│   │   ├── unicode-bidi v0.3.7
│   │   └── unicode-normalization v0.1.19
│   │       └── tinyvec v1.5.1
│   │           └── tinyvec_macros v0.1.0
│   ├── matches v0.1.9
│   └── percent-encoding v2.1.0
├── webpki v0.22.0 (*)
└── webpki-roots v0.22.2
    └── webpki v0.22.0 (*)

gzip is totally optional (both in the HTTP spec, and as a feature flag supported by ureq) so we can just disable this and see how much it saves. That's the fourth (and final?) theory for where the bloat comes from.


Based on the above analysis, I have four things to check. I'm not planning to test these combinatorically, just sequentially. (For instance, step 3 is additive with steps 1 and 2; I don't intend to test what steps 1 and 3 look like while skipping step 2.) For each step, I keep the result that produces the smaller exe. Keep in mind, that may end up being exactly what we have in 0.3.0 if all four steps show that the status quo is optimal.

The basic plan is:

  1. Replace rustls with native-tls.
  2. Replace serde_json with json.
  3. Find a way to trim IDNA from URL parsing.
  4. Disable gzip support.

I'll start with a baseline, building the 0.3.0 tag as-is.

$ cargo build --release && ls -l target/release/cartunes.exe
[...snip...]
-rwxr-xr-x 2 jay 197609 7769088 Jan 24 20:19 target/release/cartunes.exe*

The file size is a little different than what's in the zip because I'm locally building with a recent nightly compiler (our CI uses the latest stable compiler) and I'm also using a different linker and build flags between my local machine and CI.

From this baseline, replace rustls with native-tls:

-rwxr-xr-x 2 jay 197609 7010304 Jan 24 21:36 target/release/cartunes.exe*

That satisfies my curiosity for whether native-tls produces smaller code than rustls. Yes, it does. About 750 KB worth of savings in my case.

Next, replace serde_json with json:

-rwxr-xr-x 2 jay 197609 7001088 Jan 24 21:40 target/release/cartunes.exe*

This theory only gave us a marginal win. About 9 KB. Well within the noise threshold, really. That result doesn't justify the additional code to maintain:

Abandoned patch
diff --git a/Cargo.toml b/Cargo.toml
index 50e062f..812ea78 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -25,6 +25,7 @@ epaint = { version = "0.16", default-features = false, features = ["single_threa
 font-loader = "0.11"
 human-sort = "0.2"
 hotwatch = "0.4"
+json = "0.12"
 kuchiki = "0.8"
 log = "0.4"
 native-tls = "0.2"
@@ -34,11 +35,10 @@ pollster = "0.2"
 raw-window-handle = "0.4"
 rfd = "0.6"
 semver = "1.0"
-serde = { version = "1.0", features = ["derive"] }
 thiserror = "1.0"
 toml_edit = "0.13"
 unicode-segmentation = "1.7"
-ureq = { version = "2.3", default-features = false, features = ["gzip", "json", "native-tls"] }
+ureq = { version = "2.3", default-features = false, features = ["gzip", "native-tls"] }
 walkdir = "2.3"
 webbrowser = "0.5"
 wgpu = "0.12"
diff --git a/src/updates.rs b/src/updates.rs
index 0c7fde2..65ed2a6 100644
--- a/src/updates.rs
+++ b/src/updates.rs
@@ -8,7 +8,6 @@ use crate::framework::UserEvent;
 use crate::timer::Timer;
 use log::error;
 use semver::Version;
-use serde::Deserialize;
 use std::any::Any;
 use std::sync::mpsc::{sync_channel, Receiver, SyncSender};
 use std::sync::Arc;
@@ -39,6 +38,22 @@ pub(crate) enum Error {
     Persist(#[from] PersistError),
 }

+/// All the ways in which parsing the JSON response body can fail.
+#[derive(Debug, Error)]
+pub(crate) enum JsonParseError {
+    /// Invalid name.
+    #[error("Invalid release name")]
+    Name,
+
+    /// Invalid body.
+    #[error("Invalid release body")]
+    Body,
+
+    /// Invalid html_url.
+    #[error("Invalid release URL")]
+    Url,
+}
+
 /// How often to check for updates.
 #[derive(Debug, Copy, Clone, Eq, PartialEq)]
 pub(crate) enum UpdateFrequency {
@@ -75,7 +90,7 @@ struct UpdateCheckerThread {
 }

 /// Parsed API response body.
-#[derive(Debug, Deserialize)]
+#[derive(Debug)]
 pub(crate) struct ReleaseBody {
     name: String,
     body: String,
@@ -264,7 +279,7 @@ impl UpdateCheckerThread {
         };

         // Parse the response
-        let body: ReleaseBody = match res.into_json() {
+        let body_str = match res.into_string() {
             Ok(body) => body,
             Err(error) => {
                 error!("HTTP response error: {:?}", error);
@@ -272,6 +287,22 @@ impl UpdateCheckerThread {
             }
         };

+        let body_json = match json::parse(&body_str) {
+            Ok(body) => body,
+            Err(error) => {
+                error!("JSON parse error: {:?}", error);
+                return self.duration;
+            }
+        };
+
+        let body: ReleaseBody = match body_json.try_into() {
+            Ok(body) => body,
+            Err(error) => {
+                error!("JSON parse error: {:?}", error);
+                return self.duration;
+            }
+        };
+
         // Parse the version in the response
         let version = match Version::parse(&body.name) {
             Ok(version) => version,
@@ -317,3 +348,31 @@ impl UpdateCheckerThread {
         }
     }
 }
+
+impl TryFrom<json::JsonValue> for ReleaseBody {
+    type Error = JsonParseError;
+
+    fn try_from(mut value: json::JsonValue) -> Result<Self, Self::Error> {
+        let name = value
+            .remove("name")
+            .as_str()
+            .ok_or(Self::Error::Name)?
+            .to_string();
+        let body = value
+            .remove("body")
+            .as_str()
+            .ok_or(Self::Error::Body)?
+            .to_string();
+        let html_url = value
+            .remove("html_url")
+            .as_str()
+            .ok_or(Self::Error::Url)?
+            .to_string();
+
+        Ok(Self {
+            name,
+            body,
+            html_url,
+        })
+    }
+}

For this reason, I will continue using serde_json.

Third, find a way to trim IDNA from URL parsing. There is a merged but as-yet unreleased PR for url that makes idna optional: servo/rust-url#728

Patching ureq locally to disable this, we get:

-rwxr-xr-x 2 jay 197609 6682112 Jan 24 21:45 target/release/cartunes.exe*

Saving 328,192 bytes over step 1. This is roughly in line with observations from servo/rust-url#727. Unfortunately, until url is released with this feature flag, I have to eat the >300 KB bloat.

The final theory I had was that gzip was unnecessarily bloating the executable. Since I'm not going to be using results from steps 2 or 3, this is the "last chance" for a big win. I still have about 2.3 MB completely unaccounted for in these savings figures. Let's disable gzip and hope for the best:

-rwxr-xr-x 2 jay 197609 6953984 Jan 24 21:55 target/release/cartunes.exe*

That saves 56,320 bytes over step 1. It's a modest win, but not nearly what I was looking for.


Now I need to continue digging for the cause of those extra megabytes. Using cargo-bloat's JSON output and a little bash-fu with jql points out two other potential crates to look at:

  • std added 340 functions since the 0.2.0 release
  • inplace_it added 322 functions since the 0.2.0 release

That's odd, why did std usage grow so much? Unfortunately, build-std didn't seem to help, plus I ran into issues with panic = "abort" when I tried: rust-lang/cargo#7359 The additional std usage is responsible for about 124 KB of executable size. This is most likely from sockets and threads, but that's just a guess.

I have no idea what inplace_it is:

$ cargo tree -i inplace_it
inplace_it v0.3.3
└── wgpu-hal v0.12.4
    ├── wgpu v0.12.0
    │   ├── cartunes v0.3.0 (C:\Users\jay\projects\cartunes)
    │   └── egui_wgpu_backend v0.16.0
    │       └── cartunes v0.3.0 (C:\Users\jay\projects\cartunes)
    └── wgpu-core v0.12.2
        └── wgpu v0.12.0 (*)

It's used by wgpu. 🤷 And it appears to contribute directly to about 32 KB of executable size.

At this point, I still don't know what the last 2 MB is from. That's a crazy number to not stick out like a sore thumb.

@parasyte parasyte mentioned this issue Jan 25, 2022
3 tasks
@parasyte parasyte added the bug Something isn't working label Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant