Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make classpath scanning optional #96

Open
dsyer opened this issue Oct 14, 2022 · 30 comments
Open

Make classpath scanning optional #96

dsyer opened this issue Oct 14, 2022 · 30 comments

Comments

@dsyer
Copy link

dsyer commented Oct 14, 2022

There's a discussion here spring-projects/spring-framework#27619 about how some common use cases do not require classpath scanning (specifically versionless URL path construction). Myself and @vpavic have both volunteered to contribute code here, but we need some guidance. One option would be to extract the "core core" into yet another Jar file (what would it be called?). Or we could make the classpath scanner an optional dependency here (use it to compile and test but not include it transitively by default). Or we could keep the dependency and simply make the classpath scanning lazy (only do it if it is needed).

UPDATE: added a third option.

@vpavic
Copy link

vpavic commented Oct 14, 2022

Regarding naming, if new library is the preferred route, maybe we could take some inspiration from the name of the component we have in Spring (WebJarsResourceResolver) and name it webjars-resource-resolver or simply webjars-resolver.

@dsyer
Copy link
Author

dsyer commented Oct 14, 2022

Actually, maybe we should start by asking a simple question: why does webjars need classpath scanning? What are the use cases that supports?

@vpavic
Copy link

vpavic commented Oct 15, 2022

I haven't inspected webjars-locator-core at depth, but from my experimentation in vpavic/spring-framework/tree/gh-27619, I suspect that resolving WebJars of Bower GitHub variant cannot be done as easily as for Classic and NPM variants and is probably (one of the things) reliant on classpath scanning.

This is because Bower GitHub variant does not have as static path to pom.properties as Classic and NPM variants do (META-INF/maven/org.webjars/{webjar}/pom.properties and META-INF/maven/org.webjars.npm/{webjar}/pom.properties, respectively) but is rather dependent on the source GitHub org and repo and is like META-INF/maven/org.webjars.bowergithub/{org}/{repo}/pom.properties. Similarly, the path to assets themselves (resources/webjar/...) does not contain version unlike with the other two variants.

You can observe these differences by taking a look at, for example, Bootstrap WebJar in different variants:

  • org.webjars:bootstrap:5.2.2
  • org.webjars.npm:bootstrap:5.2.2
  • org.webjars.bowergithub.twbs:bootstrap:5.2.2

@vpavic
Copy link

vpavic commented Oct 15, 2022

Another thing to consider, I originally learned about spring-projects/spring-framework#27619 after opening spring-projects/spring-boot#31560 because Spring Boot wasn't picking up webjars-locator-core updates due to versioning policy.

That led me to open #86, and I think that if what we're discussing here end up being part of webjars-locator-core, those concerns will remain and Spring Boot will continue to pick up updates at a very slow pace unless changes occur somewhere (either this repo adopts SemVer, or Spring Boot makes exception to its policy).

@dsyer
Copy link
Author

dsyer commented Oct 15, 2022

I suspect that we may only support “npm style” webjars, but that’s OK by me - if it’s documented it should be fine for everyone IMO.

If we want Spring Boot users to be able to take advantage, I think @vpavic’s argument suggests that the best way to implement this is to make the classpath scanning lazy in the existing library - only use it if direct lookup fails. I haven’t looked at the code yet but that seems like a feasible solution.

@vpavic
Copy link

vpavic commented Oct 15, 2022

I suspect that we may only support “npm style” webjars

Hmm, why do you think it's only NPM variant we can support?

The way I see it, Classic and NPM variant are basically equals in terms of the way they are resolved from the classpath. Bower GitHub seems like the only one that isn't straightforward (I'm ignoring Bower Original variant as webjars.org says it's deprecated).

@dsyer
Copy link
Author

dsyer commented Oct 15, 2022

You’re probably right. I suppose I should rephrase what I said as “I don’t care if we can only support NPM packages”. It’s a bonus if some other layouts work but I don’t think it’s worth a lot of additional effort.

@dsyer
Copy link
Author

dsyer commented Oct 21, 2022

Here's an attempt at a small step that delivers something useful: https://github.com/dsyer/webjars-locator-core/tree/version-locator. It adds a new utility WebJarVersionLocator with enough features to be used by Spring (and anyone else who simply wants versionless paths) but no classpath scanning. Add feedback here or on my fork.

@dreis2211
Copy link

dreis2211 commented Nov 27, 2022

After the interim rise of activity to resolve this there seems to be some loss of traction again.

I can't stretch enough how much of an impact the classpath scanning has. Especially on bigger projects and bigger test suites I see a major portion of allocations being caused by this. Which in turn triggers GC again, which has an impact on CPU and timings.

Here's a (large) project where 17% of allocations are spent on only classpath scanning during tests. I've shared one in the original Spring-Framework issue that even has up to 60%.

image

What's especially bad in the Spring case it's that it's doing the classpath scanning multiple times. (Basically once per registration inside ResourceHandlerRegistry.

I can workaround a few places where I have control over the code, but it would be really better if this would be solved for good in the underlying libraries.

Cheers,
Christoph

@jamesward
Copy link
Member

Sorry this took so long. I've been trying to wrap my head around everything that exists today and why. Here are some initial thoughts...

  1. Classpath Scanning:

    For user ergonomics we don't require a WebJar artifact ID or a full path to the file (after the version). So the user can just write: webJarAssetLocator.getFullPath("bootstrap.min.js") and they get the resolved artifact (WebJar and subdir), like: /META-INF/resources/webjars/bootstrap/3.1.1/js/bootstrap.min.js. This resolution requires doing some sort of classpath scanning. I think we used to manually get directory content listings but switched to using the Classgraph library. We could maybe switch to PathMatchingResourcePatternResolver in a Spring implementation. Does that work with GraalVM Native Image correctly?

    As @dsyer proposed, we could offer an API that requires a WebJar artifact ID and a full path but I think the DX will take a negative hit. Today there is a similar method (but it is implemented internally with Classgraph): public String getFullPathExact(@Nonnull final String webJarName, @Nullable final String exactPath)

    Having a library that just has that method would definitely be an option but that doesn't address the ergonomic issues. Maybe those issues aren't a problem for Spring as it seems maybe most users don't use the looser getFullPath(partialPath) method.

  2. Version Caching:

    Even if we get rid of scanning we will probably want to figure out some way to cache the version lookup as it is something we don't want to do every time a WebJar asset is resolved.

  3. Bower GitHub WebJars

    For a variety of "good reasons" we decided not to put the version in the Bower GitHub WebJars paths. Whatever change we make should be compatible with those WebJars or users will have a hard time.

  4. Spring & Versions in Paths:

    As we look at making changes to this in Spring we really need to look at a few other things which don't fit the usual WebJars paradigm. First, it is intentional that we put the version in the path so that good HTTP cache headers can be put on the WebJar requests. It looks like we break that with the way Spring currently removes the path. Second, for Play we add an easy config option to switch to using the WebJars CDN. Both of these could be accomplished with some HTML templating utils which we should investigate for doing in Spring.

Overall I think the best way to deal with all of this would be to do a lot more at build-time. If we have any opportunity to do that, I think that is the best place to invest the energy but requiring another build plugin is probably a non-starter so may not be an option.

If we just want to plug the current hole and ignore the DX & HTTP cache issues, we can take @dsyer's changes and publish a webjars-locator-core-base (or whatever) with the non-scanning logic but we will still need something to address the version caching and I'm not sure the best place or way to handle that.

@dreis2211
Copy link

dreis2211 commented Nov 27, 2022

Overall I think the best way to deal with all of this would be to do a lot more at build-time. If we have any opportunity to do that, I think that is the best place to invest the energy but requiring another build plugin is probably a non-starter so may not be an option.

I don't think this is practical in some cases. Consider for example using https://github.com/springdoc/springdoc-openapi, which at least for our projects is actually the reason why the WebJarsResourceResolver is added internally in Spring because this lib pulls in webjars-locator-core. With this lib we provide swagger docs. It's not uncommon that you provide swagger docs based on some sort of profile or configuration that's disabled/enabled in certain environments. I could imagine this would complicate things during build-time at the very least. Or you would need to act like things would be enabled every-time and ignore any user-side configuration during build-time. Maybe you have a more concrete picture already in your head where this wouldn't be an issue, but I wanted to let you know of the use-case...

@jamesward
Copy link
Member

Thanks @dreis2211 for the use case. It might be possible in that case to have the generated file in the springdoc-openapi artifact. But that does complicate things a bit. Question for you on the OpenAPI use case... Are you doing some templating that could handle the versioned paths or do your paths to WebJar contents have to be versionless?

@dreis2211
Copy link

dreis2211 commented Nov 27, 2022

We don't do any templating - that's all on the library end. And by default the library is accessing stuff in a versionless fashion. But you can set the springdoc.swagger-ui.version in which case it uses the versioned one. Check https://github.com/springdoc/springdoc-openapi/blob/master/springdoc-openapi-common/src/main/java/org/springdoc/ui/AbstractSwaggerWelcome.java#L240 . But that's impractical in terms of DX as you have to know which swagger-ui version is used inside springdoc... There is a reason after all for the versionless resolution.

In all fairness. The library also provides ways already to build the OpenAPI json at build-time. But to the best of my knowledge this doesn't build the complete HTML which uses the swagger-ui stuff in the end. It just builds up the data. I wanted to mention this for completeness reasons, because it might be possible to hook into this existing infrastructure.

The workaround posted by @dsyer in Spring Petclinic works for the Spring-Doc use-case as well and is what I'm mostly using these days to workaround the performance problems. Next to excluding webjars-locator-core in tests when possible (e.g if the Swagger page is not covered by tests). If I have access to the code, at least.

@dsyer
Copy link
Author

dsyer commented Nov 28, 2022

publish a webjars-locator-core-base (or whatever) with the non-scanning logic

Just for the sake of clarity: my proposal does not require a new jar (although that would make it easier to exclude the github dependency) - it's just a new utility API that you can use directly or through the old interface. A reasonable and conservative step in the right direction IMO would be just to include it in webjars-locator-core. It might even be possible to keep the interface of the existing WebjarResourceLocator and make the new utility purely an implementation detail, but that would require some major refactoring so I didn't want to propose it as a first step.

@jamesward
Copy link
Member

I was thinking a new artifact would be nice so we wouldn't have the transitive deps.

@dsyer
Copy link
Author

dsyer commented Nov 29, 2022

webjars-locator-lite?
webjars-locator-utils?

@jamesward
Copy link
Member

Either is good. I think before we move forward we need to think more about the version lookup caching piece. Any thoughts on the best way to handle that?

@mikele
Copy link

mikele commented Nov 15, 2023

Hi any new thoughts about how to solve this?

@jamesward
Copy link
Member

I think there is still a lot to be resolved around this and I haven't had time to solve any of it. :(

@dsyer
Copy link
Author

dsyer commented Nov 16, 2023

For clarity sake, I don’t know why we can’t move forward with my patch. There is no caching issue in practice (IIRC Spring does all the work already, and probably other frameworks do), and classpath scanning could be eliminated with no impact on user code.

@jamesward
Copy link
Member

There are still 4 unresolved issues:

  1. Partial paths are not supported
    It is nice to be able to do something like: WebJarVersionLocator.fullPath("jquery", "jquery.min.js") which in the NPM WebJar is actually in dist/jquery.min.js
    Omitting the dist/ is nice and removing the functionality has downsides even if we don't force users to use the new library.
  2. Version Caching
    My understanding of how this works is that a request comes in for like /webjars/jquery/jquery.min.js and to resolve that to the file in the classpath, it needs to call WebJarVersionLocator.fullPath which needs to call WebJarVersionLocator.webJarVersion which has to do a bunch of stuff to resolve the version. Same for using WebJarVersionLocator.fullPath in a template. The current WebJars Locator pre-caches these versions so they are just lookups in a Map. From what I can tell, that version resolution would have to happen on every request / call to fullPath.
  3. Bower GitHub WebJars are not supported
    We can't provide a new locator that doesn't work with Bower GitHub WebJars unless we want to officially deprecate Bower GitHub WebJars. I'm not totally opposed to doing that since I think Bower is pretty dead at this point but I want to check with Vaadin as this whole system was for them.
  4. Cache friendly request paths
    I still would like to work with the Spring folks to find a better way to support easily client-side cacheable HTTP paths. I think there is considerable value to including the version numbers in the paths and enabling easy CDN switching. We could do this later but it'd be nice to at least decide where we want to go so that we don't make it hard to get there down the road.

@dsyer
Copy link
Author

dsyer commented Nov 16, 2023

Thanks James. IMO 1-3 are all nice to have - they are optional features that no-one has to use and we don't have to take them away just to support a different (also optional) utility. 4 is a solved problem AFAIK - Spring (and other frameworks) create cacheable resource URIs from the full path, but the developer never has to know the version (except to put the right thing on the classpath). This has been the case for many years, and all we want to do here is make the classpath resource resolution more efficient. So I still think the most pragmatic approach is to simply include the classpath utility stuff in the next release of webjars and move on to some of the other things if anyone cares deeply enough. It can be a separate jar file or it can go in the core, from a purely technical point of view it doesn't matter, but the github dependency should be optional either way.

@jamesward
Copy link
Member

  1. I agree that we could do without it in a locator-lite library.
  2. I'd like to see some performance data before I'd call this nice to have. What is the current time per request now vs with the change?
  3. I've asked Vaadin for feedback on deprecating Bower GitHub WebJars.
  4. I'll need some more details on that, anything you can point me to?

@dsyer
Copy link
Author

dsyer commented Nov 16, 2023

\2. The local cache doesn't have to be provided by webjars (libraries that use webjars could implement it also), but if you want one in a static Map or something simple it would be totally trivial to add it and I could do that as part of my patch. Just ask.

\4. https://docs.spring.io/spring-framework/reference/web/webmvc/mvc-config/static-resources.html and https://docs.spring.io/spring-framework/reference/web/webmvc/mvc-config/static-resources.html (cache-busting explicitly mentioned there).

@jamesward
Copy link
Member

jamesward commented Nov 16, 2023

\2. Totally open to the cache being external to WebJars Locator. I'm assuming Spring already has some mechanisms for in-memory caches. If you think we can get that into Spring then I'm down but we will need to consider both the HTTP request and the template usage.
\4. I'll investigate this further but I think we should also consider the CDN swapping part as something that we should add to Spring down the road.

@dsyer
Copy link
Author

dsyer commented Nov 17, 2023

Spring has a CachingResourceResolver which can be enabled easily in a Spring Boot application I believe. There might still be some value in a local cache inside the webjars locator.

CDN swapping as in "select the right CDN for the geographical origin of the request?" Is that anything to do with webjars? Maybe I misundertood.

@jamesward
Copy link
Member

I've forked the changes:
https://github.com/webjars/webjars-locator-lite

I cut out all the overlap with webjars-locator-core.

And released this for testing:
org.webjars:webjars-locator-lite:0.0.1

As far as CDN stuff goes, in Play server-side templates you can do this:

@webJarsUtil.locate("bootstrap.min.css").css()

Which resolves to the local server's path by default, but with this setting:

webjars.use-cdn=true

The path changes to (dependent on the WebJar type and version) in the classpath:

https://cdn.jsdelivr.net/webjars/org.webjars/bootstrap/5.3.2/css/bootstrap.min.css

Let me know if this is headed in the right direction

@dsyer
Copy link
Author

dsyer commented Nov 19, 2023

That's the same code I posted, so it gets my thumbs up. We should take the CDN topic elsewhere (another issue or something).

@jamesward
Copy link
Member

Yup! Thank you. Let me know how integrating into Spring goes. For the CDN and caching topics, would you rather I file issues to facilitate discussion on a Spring repo or on the new lite repo?

@dsyer
Copy link
Author

dsyer commented Nov 19, 2023

would you rather I file issues to facilitate discussion on a Spring repo or on the new lite repo?

I don't mind. I don't really understand the requirement yet. We could do a "discussion" topic in the lite repo if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants