Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regression: massive dependency tree on import #255

Closed
howardjohn opened this issue Sep 10, 2020 · 38 comments
Closed

regression: massive dependency tree on import #255

howardjohn opened this issue Sep 10, 2020 · 38 comments

Comments

@howardjohn
Copy link

Importing github.com/prometheus/common now causes import of 7 million lines of code since #242. Prior to that, only 1.5 MLOC were imported.

This includes:

  • aws-sdk-go and aws-sdk-go-v2, and google.golang.org/api at 1M each
  • the entireity of Envoy, Consul, Etcd, and Nats
  • many many more

The root of this is github.com/go-kit/kit ultimately. It imports a ton of stuff recently. Note that client_golang has not yet updated, so likely the ecosystem is not yet fully impacted unless they update prometheus/common directly.

We use it only for ~100 lines of logging code, so we can almost just replace it entirely. However, github.com/prometheus/common actually depends on an older version(s) of github.com/prometheus/common due to circular imports. The circular import can be removed if github.com/mwitkow/go-conntrack dependency is dropped. go-conntrack offers tracing and monitoring (using prometheus). Our usage is only the tracing. Therefor, we could remove this circular import with a fork of go-conntrack if desired

@brian-brazil
Copy link
Contributor

This repo is an internal library, so dependencies we need for Prometheus anyway aren't a problem per-se nor are issues that arise from 3rd parties importing it directly as such usage isn't supported.

However nothing in this repo should be depending on all of that stuff, and for example the entirety of Nats should be not getting compiled into Prometheus. We only depend on the logging bits of go-kit. Can you explain how this is happening?

Also, it is not generally appropriate to fork our direct dependencies over issues like this. It there's an issue let's tackle it more directly - and I believe you'll find the circular import (which isn't actually circular - as Go forbids those) is between client_golang and common, which we are aware of and which is much trickier to resolve.

@SuperQ
Copy link
Member

SuperQ commented Sep 10, 2020

I think this assessment is incorrect. The difference between what Go downloads and what is actually vendored is different.

If I check prior to #242 and run go mod vendor this is the result:

[prometheus/common] $ git checkout 546f1fd8d7df61d94633b254641f9f8f48248ada
...
[prometheus/common] $ go mod vendor
[prometheus/common] $ cloc vendor
     608 text files.
     588 unique files.
      64 files ignored.

3 errors:
Line count, exceeded timeout:  ./golang.org/x/net/idna/tables10.0.0.go
Line count, exceeded timeout:  ./golang.org/x/net/idna/tables11.0.0.go
Line count, exceeded timeout:  ./golang.org/x/net/idna/tables9.0.0.go

github.com/AlDanial/cloc v 1.82  T=12.88 s (42.2 files/s, 25743.1 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Go                              469          25917          29497         269287
Markdown                         18            641              0           1815
Assembly                         33            253            300           1558
Bourne Shell                      2             67            297            507
Bourne Again Shell                2             41             54            382
Perl                              1             25             22            218
YAML                             10             14              7            134
Protocol Buffers                  3             17            351             38
Dockerfile                        1              9              9             33
JSON                              1              0              0             25
C                                 1              8              7             24
make                              3              8             12             19
--------------------------------------------------------------------------------
SUM:                            544          27000          30556         274040
--------------------------------------------------------------------------------

If I pull master, and do the same go mod vendor this is the result:

[prometheus/common] $ git checkout master
...
[prometheus/common] $ go mod vendor
[prometheus/common] $ cloc vendor
     807 text files.
     783 unique files.                                          
      75 files ignored.

4 errors:
Line count, exceeded timeout:  ./golang.org/x/net/idna/tables10.0.0.go
Line count, exceeded timeout:  ./golang.org/x/net/idna/tables11.0.0.go
Line count, exceeded timeout:  ./golang.org/x/net/idna/tables12.00.go
Line count, exceeded timeout:  ./golang.org/x/net/idna/tables9.0.0.go

github.com/AlDanial/cloc v 1.82  T=16.90 s (43.4 files/s, 18886.8 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Go                              651          25543          32981         253614
Markdown                         20            755              0           2100
Assembly                         38            311            368           1740
Bourne Shell                      2             70            325            548
Bourne Again Shell                3             45             57            401
YAML                             12             20             12            166
make                              4             19             12             52
Dockerfile                        1              9              9             33
JSON                              1              0              0             25
C                                 1              8              7             24
--------------------------------------------------------------------------------
SUM:                            733          26780          33771         258703
--------------------------------------------------------------------------------

The net change for the current dependencies is smaller by ~15k LoC.

@howardjohn
Copy link
Author

This repo is an internal library, so dependencies we need for Prometheus anyway aren't a problem per-se nor are issues that arise from 3rd parties importing it directly as such usage isn't supported.

I am not sure its as simple as that? Anyone using client_golang is also using common, indirectly or otherwise. So this doesn't just impact Prometheus libraries importing it directly, it also impacts anyone importing any prometheus libraries (then anyone importing a library that imports prometheus, etc).

I have some concerns that once this propagates out to more and more downstream dependencies, the entire go ecosystem will end up importing this bloat. A huge amount of libraries are importing Prometheus, so if any single module in a dependency chain imports a Prometheus version with this dependency on go-kit v0.10, we will get all of these dependencies as well.

@brian-brazil
Copy link
Contributor

I had already poked @beorn7 about this on potential client_golang impact. As far we can tell and as illustrated by @SuperQ above, this shouldn't cause any problems in practice as we don't actually import or vendor any of this code. What is the concrete issue this is causing for you?

@bwplotka
Copy link
Member

bwplotka commented Sep 11, 2020

I have some concerns that once this propagates out to more and more downstream dependencies, the entire go ecosystem will end up importing this bloat.

While this is true, this is nothing irreversible. Also, the truth is that technically the bloat does not matter as if you don't import a package that imports that bloat, it will NOT end up as a dependency of yours for your module.

I don't want to speculate, would be nice to see the exact, other problems this situation can cause.

As per the solution, it's not that easy to remove https://github.com/go-kit/kit/tree/master/log dep, by forking, etc. The reason is that everyone depends on go-kit anyway, so there will be always some helper etc, using it. It would need to be a bigger upstream change. Can we at least try to ask go-kit to move the log to a separate sub-module first? Probably unlikely they will accept it, but it will at least highlight the problem on their side. There is also a road map: go-kit/kit#843 which tells us they plan to clean the packages a bit. (drop unnecessary ones)

@howardjohn
Copy link
Author

While this is true, this is nothing irreversible.

I disagree with this. Prometheus import is present in virtually every single go project's dependency tree. Once this propagates to a few key packages (prometheus/client_go being an obvious one, other common targets like grpc will follow) I don't see how it can be prevented - someone will always be importing the bloated version.

I think its fair to say that there is a difference between a direct dependency (ie shows up in vendor), indirect dependency (shows up in go.sum, must be downloaded, etc), and no dependency.

Making the claim that indirect dependency and no dependency are close enough that it doesn't really matter if we import a ton of bloat is a reasonable one for certain projects I think - and ultimately this is the Prometheus project's decision to make for their own projects. However, I encourage you to consider that by making this choice you are essentially forcing this decision on majority of the golang ecosystem. No project will be able to decide they want to avoid downloading GBs of dependencies if they want to depend on any project that imports Prometheus.

Personally, I don't think that is the correct decision, and will be very disappointed to bring in even more dependencies next time we update our Prometheus dependency.

@roidelapluie
Copy link
Member

roidelapluie commented Sep 15, 2020

Did you file an issue in go-kit? One of the issues in 0.10 was needed for prometheus.

@beorn7
Copy link
Member

beorn7 commented Sep 16, 2020

[…] someone will always be importing the bloated version.

Even if Prometheus would hold back from requiring recent versions of go-kit/kit, someone in the dependency tree will. The probability might be approaching 1 slightly more slowly, but go-kit/kit (or parts thereof) are popular enough to not really make a dent in the big picture.

github.com/prometheus/client_golang/prometheus has 14291 importers according to https://pkg.go.dev/ , github.com/go-kit/kit/log has 4090.

If this is a problem that can be solved, it needs to be solved in go-kit/kit.

But I do think this is a bit of a wild-goose chase. I think it's the nature of the Go dependency management as it has evolved that you will have to look at most of the popular open source projects written in Go to resolve the dependencies (and at most of their releases on top of that, see our little problem of pseudo-circular dependencies). You'll have half of the universe in your cache eventually, and I guess "that's fine" 🐶

@roidelapluie
Copy link
Member

And if that is really an issue, we should keep the vendors directories in our projects as it prevent users to have them in their cache.

@sagikazarmark
Copy link
Contributor

One of the major issues with Go modules is that even if a dependency is not part of the final artifact, Go will still download it (or parts of it, based on whether it has a go.mod file or not). This is particularly annoying if you use go mod download in a dockerfile for example to propagate cache: then it will download everything.

I agree that go-kit should probably improve the situation (by splitting up the monorepo or by introducing submodules)

@peplin
Copy link

peplin commented Nov 25, 2020

I'm a newcomer to Go, so unfortunately I can't offer any solutions, but I wanted to note that exploded dependency tree is also problematic in organizations where all direct and indirect dependencies must go through open source license review. So far in my review, all of the dependencies have permissive licenses, but it's a lot to ask of our lawyers.

The important metric for me is number of dependencies (list_dependencies is an internal tool to list all direct and indirect dependencies of a module):

$ list_dependencies github.com/prometheus/client_golang v1.7.1 | wc -l
116
$ list_dependencies github.com/prometheus/client_golang v1.8.0 | wc -l
230

I understand the nuance of dependency being in go.sum versus actually being compiled, but it doesn't make a difference for license review (since you're just 1 import away from using the code once it's in go.sum).

@sagikazarmark
Copy link
Contributor

@peplin you can strip down the number of licenses to review by using this tool: https://github.com/mitchellh/golicense

Under the hood it uses github.com/rsc/goversion which can tell you the import path of dependencies compiled into a binary.

I understand the nuance of dependency being in go.sum versus actually being compiled, but it doesn't make a difference for license review

I wonder why. If a piece of code never makes it to the binary why is it a subject of a license check?

since you're just 1 import away from using the code once it's in go.sum

Isn't that true if an import is not in go.sum?

@howardjohn
Copy link
Author

The reason I initially reported this was we were doing the same license checking @peplin. I agree with @sagikazarmark that its not really valid; we switched to go list -deps -test which I think golicense is using under the hood.

Despite licenses there is still some impact to these dependencies, although it is much smaller than I originally thought; basically just go.sum noise which is irrelevant, and go mod download is slower (we cache it in CI, so it hardly impacts us).

@mholt
Copy link

mholt commented Dec 30, 2020

We in the Caddy project would also like to see the dependency tree shrink considerably. It makes development more tedious, and every single line of that go.sum represents a new point of failure. All it takes is one mistake in the huge chain (a bad commit, network error, repository misconfiguration, go.mod misconfiguration, you name it) and the Go tooling stops working on the whole project, making it practically impossible to dev with.

@sagikazarmark
Copy link
Contributor

etcd 3.5 (properly adapting modules) should improve the situation considerably (which is a dependency of go-kit itself, see etcd-io/etcd#12498). But the ultimate solution for this particular issue would be extracting the logging library from the core go-kit repository.

@roidelapluie
Copy link
Member

This will be fixed upstream in go 1.17 golang/go#36460

@sagikazarmark
Copy link
Contributor

go-kit/kit#1055 might also help

@peterbourgon
Copy link

peterbourgon commented Apr 28, 2021

every single line of that go.sum represents...

Just a note to correct an unfortunately extremely common misconception: go.sum is a mostly-append-only log of checksums. It exists as a safety mechanism to guard against module corruption and malicious intermediaries. It has very little correlation to the dependency graph of your module, and shouldn't be used as a proxy for dep graph size or complexity. It should be checked in and otherwise ignored in almost all cases.

@sagikazarmark
Copy link
Contributor

That being said, go mod download is still a pain... (hopefully until lazy module loading lands)

@cpl
Copy link

cpl commented May 6, 2021

This is particularly annoying if you use go mod download in a dockerfile for example to propagate cache: then it will download everything.

This right here is my main issue with huge monorepos or dependency thirsty projects.

You can still have a single git repo codebase for multiple modules in Go.

I've also seen pgx (jackc/pgx#977) use go-kit for the logging bits of code. Pgx then is used by GORM, so dependency hell can easily propagate.

Everything below is just a rant

It is a shame to not see more developers take this (managing your dependencies) seriously. It adds unnecessary bloat and risks (what happens when a N-th degree, random dependency is removed and your build pipelines start failing?, how long will it take for the fix to propagate?; can you maintain or replace a dependency if it is no longer maintained?). If the code you need from a monolith repo is a manageable amount, and not likely to need updates, then just copy-it in your codebase.

@dims
Copy link

dims commented May 19, 2021

@gouthamve the go-kit especially drags in totally unnecessary stuff since a go.mod was added in their repo. this repo relies just on the go-kit/log but this causes the bucket-load of stuff into go.sum in this repo.

Oh! yes, see the kubernetes PR above of what we end up seeing, this is bad! please help fix!

@roidelapluie
Copy link
Member

We are monitoring the upstream situation and we will move to the independent logging when we feel the time is right, in cooperation with go kit maintainers.

@dims
Copy link

dims commented May 28, 2021

For the record - #302 (comment)

thanks @roidelapluie and @peterbourgon for your quick responses to close out the work in progress PR. Looking forward to seeing progress on this issue in due course.

@roidelapluie
Copy link
Member

The go-kit/log is stable, which means that pull requests are welcome.

@dims
Copy link

dims commented Jun 3, 2021

thanks @roidelapluie filed #304

@roidelapluie
Copy link
Member

closed by #304

@roidelapluie
Copy link
Member

released in v0.26.0

@dims
Copy link

dims commented Jun 3, 2021

thanks a ton @roidelapluie

@howardjohn
Copy link
Author

Thanks for the great work here! The work in this repo is done, but the whole ecosystem needs to update in order for this to have any impact on projects. For us in particular, contrib.go.opencensus.io/exporter/prometheus -> github.com/prometheus/statsd_exporter -> github.com/prometheus/client_golang (done already!).

I will send some PRs

howardjohn added a commit to howardjohn/statsd_exporter that referenced this issue Jun 3, 2021
howardjohn added a commit to howardjohn/statsd_exporter that referenced this issue Jun 3, 2021
Pulling in the same changes as done in
prometheus/common#255

Signed-off-by: John Howard <howardjohn@google.com>
@roidelapluie
Copy link
Member

Awesome, thanks @howardjohn !

@dims
Copy link

dims commented Jun 3, 2021

here's a list of repos that depend on prometheus/common >= v0.11.0 and <= v0.25.0 :
https://grep.app/search?q=github.com/prometheus/common%20%28v0.2%5B0-5%5D%7Cv0.1%5B1-9%5D%29&regexp=true&filter[path.pattern][0]=go.mod

@SuperQ
Copy link
Member

SuperQ commented Jun 3, 2021

Yup, I've been working on fixing a few places where we have some recursive dependencies like promethues/common and prometeus/client_golang.

@dims
Copy link

dims commented Jun 3, 2021

thanks @SuperQ !

@matthiasr
Copy link

statsd exporter 0.20.3 is out with this change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests