Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing the size of the import library crates #1964

Closed
glandium opened this issue Aug 15, 2022 · 18 comments
Closed

Reducing the size of the import library crates #1964

glandium opened this issue Aug 15, 2022 · 18 comments
Labels
enhancement New feature or request

Comments

@glandium
Copy link
Contributor

(This issue is focused on import libraries, but I'll also open issues about other size aspects for in the coming days.)

Vendoring crates with large amounts of data is costly. More so when semver incompatible versions are released often and different crates in the ecosystem use different versions.

The first and most obvious issue is that the version for the import libraries should be separate from that of the other crates. There hasn't been changes to the crates/targets/*/lib files since 0.36.0, yet, there have been 0.37.0, 0.38.0, 0.39.0 since then. Even if there had been changes, semver compatible versions would have been fine. I would suggest decoupling the versions and to publish 0.36.2, 0.37.1 and 0.38.1 "dummy" versions that depend on 0.39.0.

The second issue is the multiplication of crates for new targets, when they are not strictly necessary (especially in the context of what's coming below). This is addressed in #1961.

A third issue is that depending on the tooling used, the import library size can vary (for instance, if I run the tool_gnu crate right now, I somewhat end up with smaller import libraries). Relatedly, even if the same size is produced, the content is different because of timestamps within the libraries.

The remaining issue is that of the size of the import libraries themselves.

My proposed plan of action is the following (I have working patches for those, although they need some cleanup and proper commit messages):

  • Remove the separate crates for gnullvm targets (Use the msvc target crates for gnullvm targets #1961)
  • Change tool_msvc to use the LLVM tools to produce the import libraries: it's faster, the resulting import libs are 25% smaller and deterministic. Bonus: it allows to build the import libs on any OS. I don't think there is any concern about using "non-native" import libraries with MSVC because that's effectively what raw-dylib support is doing already.
  • Change tool_gnu to generate more deterministic import libraries (working around https://sourceware.org/bugzilla/show_bug.cgi?id=29489 and some other sources of non-determinism).
  • Change tool_gnu to not require two different MINGw environments.
  • Tweak tool_gnu to reduce the size of the import libraries it produces (saving ~388K per importlib)
  • Merge tool_gnu and tool_msvc because at that point, they're 95% identical, and it is easier to run just one command to refresh all the import libraries.

It's possible to make the LLVM variants something like <100K lighter (IIRC), but that makes MSVC not work with them, so I don't think it's worth bothering having separate crates for that small of a difference.

BTW, while digging in the import libs, I found that the opposite problem of #1447 exists for libraries like windows.data.pdf (all libraries with a period in their name). Fixing it ends up being easier after the work above.

Further down the road, I'm planning to submit code to LLVM to produce MINGw-style import libraries (i.e. not using the short import form), incidentally, it should produce files about ~3MB smaller each if my math is right. (This will also allow rustc's raw-dylib support to stop requiring MINGw binutils).

Eventually, I assume the goal would be for windows/windows-sys to use raw-dylib, removing the need for the import library crates entirely, but the crates will still be desirable to have around for compatibility with older (now current) versions of the rust compiler for some while. I think at that point it would be desirable to replace those crates with code that generates the import libraries on demand when the compiler version is older than the one that stabilizes raw-dylibs (which seems it might be 1.65.0?). I have some sketch code that produces MINGw-style objects already, producing the full import libraries would not be much more effort code-wise, and would be fast.

@mati865
Copy link
Contributor

mati865 commented Aug 18, 2022

Remove the separate crates for gnullvm targets (#1961)

Yeah, that will work as long as nothing CRT specific lands in import libs.

Further down the road, I'm planning to submit code to LLVM to produce MINGw-style import libraries (i.e. not using the short import form), incidentally, it should produce files about ~3MB smaller each if my math is right. (This will also allow rustc's raw-dylib support to stop requiring MINGw binutils).

One of LLVM devs (Martin Storsjö) has WIP patch for LLVM to generate legacy import lib format (the one used by Binutils and ancient MSVC versions) but it has collected some dust. I wanted to rebase it but couldn't find time and motivation.
Newer Binutils versions supposedly should work better with short import lib format so the best way to go would be reporting/fixing remaining Binutils bugs so all Windows toolchains work with the same format.
Also the same Rustc raw-dylib code would work for all targets.

@riverar
Copy link
Collaborator

riverar commented Aug 18, 2022

BTW, while digging in the import libs, I found that the opposite problem of #1447 exists for libraries like windows.data.pdf (all libraries with a period in their name). Fixing it ends up being easier after the work above.

This is a known issue, fix is coming via upstream win32metadata microsoft/win32metadata#1023.

@riverar
Copy link
Collaborator

riverar commented Aug 18, 2022

  • Remove the separate crates for gnullvm targets (Use the msvc target crates for gnullvm targets #1961)
  • Change tool_msvc to use the LLVM tools to produce the import libraries: it's faster, the resulting import libs are 25% smaller and deterministic. Bonus: it allows to build the import libs on any OS. I don't think there is any concern about using "non-native" import libraries with MSVC because that's effectively what raw-dylib support is doing already.

So my intention was to keep the binutils and LLVM worlds explicitly separate and use only the tools native to those worlds. LLVM does not provide any guarantees (that I could find) that its compatibility of bintools artifacts is something it cares about and vice versa.

So this usage of LLVM tooling to generate artifacts for various non-LLVM-native targets (on a "welp, it should work" basis) concerns me about the possible increase in crate fragility. I'm also concerned that tooling for msvc may outpace LLVM and our reliance on LLVM hobble us in the future.

Q: Do you think my worries are unwarranted?
Q: Is there any prior/related art that would make me feel better about all this?

Sounds good.

  • Change tool_gnu to not require two different MINGw environments.

Q: Are you referring to the need to fire up separate 32-bit and 64-bit environments for lib generation?

  • Tweak tool_gnu to reduce the size of the import libraries it produces (saving ~388K per importlib)

Suspect this requires LLVM, which touches on my worries above.

  • Merge tool_gnu and tool_msvc because at that point, they're 95% identical, and it is easier to run just one command to refresh all the import libraries.

Suspect this requires LLVM, which touches on my worries above.

@kennykerr
Copy link
Collaborator

@dpaoliello has made tremendous progress on raw-dylib - I'm therefore reluctant to invest in a lot more churn with the pre-built libs. We should sync with Daniel on when we can expect raw-dylib and perhaps we can avoid a lot of extra work here.

@kennykerr
Copy link
Collaborator

I'd also like to hear from @mati865 who added the new gnu targets.

@kennykerr
Copy link
Collaborator

Ah I see @mati865 is already on this issue. Never mind. 😊

@kennykerr kennykerr added the enhancement New feature or request label Aug 22, 2022
@kennykerr
Copy link
Collaborator

What's the consensus on #1961 and the remaining issues here? Do folks want to merge these targets or leave them alone until raw-dylib arrives?

@mati865
Copy link
Contributor

mati865 commented Aug 23, 2022

If somebody can confirm there are no CRT specific differences (in other words: they do not pull other import libraries from SDK/mingw-w64 during creation) between import libraries of these two targets then I wouldn't mind if they were merged into one.

@kennykerr
Copy link
Collaborator

As long as by merge we mean using the much smaller llvm libs. 😁

@glandium
Copy link
Contributor Author

glandium commented Aug 25, 2022

So at this point, here is what I have in private branches:

  • Tweaks to tool_gnu to make it require "MSYS2 MinGW 64-bit" only, and the ability to generate both i686_gnu and x86_64_gnu in one run, as well as the ability to run the script on any OS with the mingw toolchain available (albeit, this generates slightly different libraries, more on that further below). (glandium@309e808)
  • A merge of tool_gnu and tool_gnullvm, because the main difference between them is the use of llvm--prefixed tools, and choosing between the prefixed and non-prefixed ones can be done in the same tool. (glandium@cc96728)
  • An adjustment to i686_gnu and x86_64_gnu setting all the ordinal/hints to 0 (glandium@fa118ec). The reason behind this is that for some reason (I haven't dug into why, but I suspect it's related to differences in qsort in the CRT), depending on the environment in which the tool_gnu tool is run, the resulting import libraries contain slightly different ordinal/hints, but in practice the values don't matter. An alternative would be to set the ordinals to monotonically increasing values, which is easy to do as well. Either way would yield consistent results between platforms and even at all, because at the moment neither running on Linux or Windows yields the same ordinal/hints as in the msvc short imports.
  • Work in progress to generate the import libraries without any external tool. (glandium@2048560, glandium@9d41d42). At the moment the code only handles the MSVC and gnullvm variants, but I already have code locally to generate the gnu variants as well. If anything, this highlights the differences between those, and would probably help convincing you using the gnullvm ones for MSVC would ultimately not be a problem (at least it helped convince myself ; also, raw-dylib support in rustc itself doesn't bother to use MSVC tooling, it relies on LLVM). Those differences are:
    • In the global symbol tables, MSVC only stores one reference to a given symbol, while LLVM stores references to similarly named symbols for each object it appears in (for instance, __NULL_IMPORT_DESCRIPTOR appears multiple times in the symbol table).
    • MSVC doesn't store an owner or group for archive members at all, and always uses 0 as mode.
    • MSVC stores an additional symbol table with a different format and order (and it handles things properly if only the first symbol table is present).
    • The format for long names of archive members differs.
    • MSVC sets a PointerToRelocation in some sections even when NumberOfRelocations is 0.
    • LLVM doesn't strip the library extension correctly (it removes exactly 4 characters when there's a dot in the name instead of removing everything after and including the last dot, like MSVC)
    • MSVC sets a value of 0xc0000040 for section symbols.
    • MSVC adds a debug section and a symbol indicating the compiler version.
    • MSVC fills the .idata$6 section to align its size to a multiple of 2. Since the section is marked with IMAGE_SCN_ALIGN_2BYTES, this is mostly zeal.
    • Symbols are not ordered the same way.
    • Members are not ordered the same way.

I don't think any of these differences above matter when linking the import libraries.

Ultimately, I think even with raw-dylib on the way to stabilization, it is not going to be desirable to rely on it unconditionally because of the impact on the minimum supported rust version, so I think it would be desirable to still reduce the overall size of the import library crates. The *_gnu ones were reduced in size in #1968 and could be reduced a little further with a dlltool command line tweak. The *_gnullvm ones could be used to replace the *_msvc ones.

All of them could actually be generated at build time from ~1000 slocs + a list of dlls/symbols that should be in the ballpark of 500K uncompressed.

@glandium
Copy link
Contributor Author

Oh, forgot to mention in the list of differences: LLVM sets all the ordinal/hints to 0.

@glandium
Copy link
Contributor Author

glandium commented Sep 9, 2022

I refreshed the commits mentioned in #1964 (comment) with rebased versions on top of current master, but they will obviously need another refresh after next metadata update.

Anyways, thoughts, @riverar, @kennykerr?

@kennykerr
Copy link
Collaborator

Sounds good to me. I'm hoping to get another Win32 metadata update in the next day or two and would like to then publish another release of the windows crate that should resolve most if not all of the lib issues. Do you think there's value in trying to get these changes in the same release?

@kennykerr
Copy link
Collaborator

I was also thinking of adding an umbrella windows-targets crate that would provide all the target dependencies in one place rather than having to maintain duplicate dependency lists.

@kennykerr
Copy link
Collaborator

And just keep in mind that I need @riverar to sign off or direct as needed as he's been driving the non-msvc lib support. I think there are still some concerns about non-standard tooling etc.

@riverar
Copy link
Collaborator

riverar commented Sep 9, 2022

Lets go for it @glandium. 👍 Happy to help test along the way. Start with the first three non-WIP branches perhaps?

@kennykerr
Copy link
Collaborator

Anything left to do here?

@kennykerr
Copy link
Collaborator

Closing for now. Feel free to keep the conversation going if there's more to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants