Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect build time reporting #36

Closed
lheckemann opened this issue Nov 16, 2021 · 4 comments
Closed

Incorrect build time reporting #36

lheckemann opened this issue Nov 16, 2021 · 4 comments

Comments

@lheckemann
Copy link
Contributor

$ colmena build
[INFO ] Enumerating nodes...
[INFO ] Selected all 10 nodes.
[...]
    netboot ✅ 0s Built "/nix/store/cw76vdvwbw93bdgrlwgzq83hxrqj28zv-nixos-system-netboot-21.05.4116.46251a79f75"
     indigo ✅ 0s Built "/nix/store/nyhsl87mfxxm3xd2zxz91cdjfly1q5aq-nixos-system-indigo-21.05.4116.46251a79f75"

Even though a kernel was built for indigo, taking >30min, they all say 0s. This doesn't seem like intended behaviour :)

That aside, very nice piece of work, I think this will do nicely as a nixops replacement ❤️

@zhaofengli
Copy link
Owner

Now that console-rs/indicatif#325 is merged, we can use the git version and close this now.

@lheckemann
Copy link
Contributor Author

..........o ✅ 2h Built "/nix/store/bbrq6v7bzvwp2gd226jprfyih2ipg2ij-nixos-system-...
..........o ✅ 2h Built "/nix/store/6h2fk3i113f923ribnilw4ypz5sqgjky-nixos-system-...
..........d ✅ 2h Built "/nix/store/a4jidxyalfa7f442jdhv85w3hppi0kck-nixos-system-...
..........o ✅ 2h Built "/nix/store/hsyp9szbzsmyab8pims4inb8m2m836im-nixos-system-...
..........e ✅ 2h Built "/nix/store/7ny4gicf404968z5jk5l2wbw6j3dgyx3-nixos-system-...
..........e ✅ 2h Built "/nix/store/s48rbjkw91pdh0kankqif3h6ckn4gahj-nixos-system-...
..........t ✅ 2h Built "/nix/store/050n5m5732lv1m48ixhjdqhhrbl13igx-nixos-system-...
..........e ✅ 2h Built "/nix/store/6ilkng1p2s1j16niwbf2l9rmv698hmmv-nixos-system-...
..........h ✅ 2h Built "/nix/store/rbshji0r9wbph0b5zq5pkiv56y2y3ind-nixos-system-...
..........l ✅ 2h Built "/nix/store/mmz5i8kz2727k1xqxcgczrfsskpvpx64-nixos-system-...

I'm still not sure if the times reported are correct, given that most of these should have finished within a minute or two while only two of them were blocked on kernel builds :/

@zhaofengli
Copy link
Owner

This has to do with how Colmena builds the system profiles. Instead of evaluating each node individually, it writes a text file containing the system.build.toplevels and evaluates that instead, because from my experience calling nix-instantiate individually consumes much more memory in total (presumably because of the inability to reuse common nodes in their evaluation graphs). When there are a lot of nodes, it splits all selected nodes into chunks and evaluates one chunk at a time, firing off evaluation for the next one while we build and deploy those in the current chunk.

By building that text file directly, there is this unfortunate situation where some nodes can take much longer to build than others in the chunk, bogging down the deployment process (some nodes can be deployed already while we wait for others to build). That's the issue we are seeing here, and the build time is accurate in the sense that they are actually stuck for that long 😛

I think we can address that by doing nix-store -q --references on the derivation of the text file, and calling nix-build for the profile derivations individually. No work is duplicated since a lock is held in the Nix store for each derivation that is being built, and for cases like this the deployment can be much faster for nodes that don't require a kernel build.

@lheckemann
Copy link
Contributor Author

Makes sense, thanks for the detailed response and new issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants