Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Servlet PathMappings for Regex #7891

Closed
gregw opened this issue Apr 19, 2022 · 24 comments · Fixed by #7892, #7947 or #9090
Closed

Better Servlet PathMappings for Regex #7891

gregw opened this issue Apr 19, 2022 · 24 comments · Fixed by #7892, #7947 or #9090
Assignees
Labels
Bug For general bugs on Jetty side Sponsored This issue affects a user with a commercial support agreement Stale For auto-closed stale issues and pull requests

Comments

@gregw
Copy link
Contributor

gregw commented Apr 19, 2022

Jetty version(s)

= 10

Description

The path mappings for RegexPathSpec does not use groups for prefix matches to set servletPath and pathInfo.

@gregw gregw added the Bug For general bugs on Jetty side label Apr 19, 2022
@gregw gregw self-assigned this Apr 19, 2022
@gregw gregw added this to To do in Jetty 12.0.ALPHA1 via automation Apr 19, 2022
@gregw gregw added this to To do in Jetty 10.0.10/11.0.10 - 🧊 FROZEN 🥶 via automation Apr 19, 2022
@gregw gregw added the Sponsored This issue affects a user with a commercial support agreement label Apr 19, 2022
@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

We could allow users to label their groups for what they want those sections to be when using more complex regex patterns.

Eg:

        Pattern pat = Pattern.compile("^(?<servlet>/([A-Za-z0-9]{3,})/([0-9]{3}))(?<pathInfo>/.*)?$");

        Matcher mat = pat.matcher("/mango/001/sku-888");
        if (mat.matches())
        {
            System.out.printf("Groups = %d%n", mat.groupCount());
            for (int i = 0; i < mat.groupCount(); i++)
            {
                System.out.printf("Group[%d] = '%s'%n", i, mat.group(i));
            }

            System.out.printf("Servlet - %s%n", mat.group("servlet"));
            System.out.printf("PathInfo - %s%n", mat.group("pathInfo"));
        }
        else
        {
            System.out.println("Not a match");
        }

Output:

Groups = 4
Group[0] = '/mango/001/sku-888'
Group[1] = '/mango/001'
Group[2] = 'mango'
Group[3] = '001'
Servlet - /mango/001
PathInfo - /sku-888

@jluehe
Copy link
Contributor

jluehe commented Apr 19, 2022

@gregw @joakime, this looks a little verbose to me. How about adopting a convention where there can be at most two groups per matched regex pattern, where the first group denotes the servletPath, and the (optional) second group, if present, denotes the pathInfo?

I've rewritten all our regex patterns for servlet mappings accordingly, and applied the following "hack" to ServletPathMapping, which has allowed me to restore our original mapping semantics. Of course this is just a hack, a proper fix would store the group(s) inside RegexPathSpec when it is first matched, so they would not have to be recalculated again:

diff --git a/jetty-server/src/main/java/org/eclipse/jetty/server/ServletPathMapping.java b/jetty-server/src/main/java/org/eclipse/jetty/server/ServletPathMapping.java
index c03607f542..6dcdd40cac 100644
--- a/jetty-server/src/main/java/org/eclipse/jetty/server/ServletPathMapping.java
+++ b/jetty-server/src/main/java/org/eclipse/jetty/server/ServletPathMapping.java
@@ -18,8 +18,12 @@ import javax.servlet.http.HttpServletRequest;
 import javax.servlet.http.MappingMatch;
 
 import org.eclipse.jetty.http.pathmap.PathSpec;
+import org.eclipse.jetty.http.pathmap.RegexPathSpec;
 import org.eclipse.jetty.http.pathmap.ServletPathSpec;
 
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
 /**
  * Implementation of HttpServletMapping.
  *
@@ -92,11 +96,16 @@ public class ServletPathMapping implements HttpServletMapping
         }
         else
         {
-            // TODO can we do better for RegexPathSpec
             _mappingMatch = null;
             _matchValue = "";
-            _servletPath = pathInContext;
-            _pathInfo = null;
+            final Pattern pattern = ((RegexPathSpec) pathSpec).getPattern();
+            final Matcher matcher = pattern.matcher(pathInContext);
+            final boolean matched = matcher.matches();
+            if (!matched) {
+                throw new AssertionError(pattern + " should have matched " + pathInContext);
+            }
+            _servletPath = matcher.group(1);
+            _pathInfo = (matcher.groupCount() == 2 ? matcher.group(2) : null);
         }
     }

Please let me know what you think. Thanks!

@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

The first group is rarely the servletName

Some examples.

^/content/sku-[0-9]*/(.*)$
^/inv/([A-Za-z0-9]*)/sku/([0-9]*)/(.*)$
^/active/([^/]*)/([0-9]*)/(.*)$
.*\.set$

@gregw
Copy link
Contributor Author

gregw commented Apr 19, 2022

The other thing to consider with using group names is that it could be possible to use them to break the invariant that path = contextPath + servletPath + pathInfo which may have some unintended consequences elsewhere.

@jluehe is the linked PR workable? (other than it still does the match multiple times)

@gregw gregw linked a pull request Apr 19, 2022 that will close this issue
gregw added a commit that referenced this issue Apr 19, 2022
Fixed test

Signed-off-by: Greg Wilkins <gregw@webtide.com>
@jluehe
Copy link
Contributor

jluehe commented Apr 19, 2022

@gregw, I've tried out the linked PR and ran into an issue:

URI=https://localhost:6101/services/data/v56.0/connect/knowledge-migrations

Pattern: ^(./services/data[/|.]?)(.)$

Expected assignments:

_servletPath: /services/data/
_pathInfo: v56.0/connect/knowledge-migrations

Assignments with linked PR:

_servletPath: /services/data/v56.0/connect/knowledge-migrations
_pathInfo: null

@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

That pattern is odd and would not match the provided URI path.

Try it yourself on regex101.com...

https://regex101.com/r/XfSIo7/1

The dot at the beginning would mean a match of any single character before the /services portion, which your input doesn't have.

Here's an slightly adjusted regex, with similar results.
The other parts of your regex (eg [/|.]?) are matching things I don't think you are intending.

https://regex101.com/r/O4dEsL/1

Check the "Match Information" panel on the right, the "Group 1" results would be your Servlet Name, and the "Group 2" results would be your Path Info.

@jluehe
Copy link
Contributor

jluehe commented Apr 19, 2022

Sorry, @joakime, the pattern I quoted was supposed to read: ^(.*/services/data[/|\.]?)(.*)$

I agree the leading .* could be omitted - it was supposed to more closely match the original pattern in our custom regex support, which used to look like /services/data(/|\.)*, but the implementation was using Matcher.find instead of Matcher.match, for some obscure historical reasons

@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

The other thing to consider with using group names is that it could be possible to use them to break the invariant that path = contextPath + servletPath + pathInfo which may have some unintended consequences elsewhere.

I'm beginning to feel that the invariant you mention is irrelevant when using UriTemplatePathSpec or RegexPathSpec.
That's really just a ServletPathSpec behavior.

Or we should just take a page from Servlet url-pattern suffixes and just treat the whole input request (post context) to be the Servlet Path, and have no Path Info?

Example:

  • context path is /ctx
  • url-pattern is *.dump
  • request is /ctx/deep/in/the/heart/of/texas.dump

Results in :

  • The request.contextPath is "/ctx".
  • The request.servletPath is "/deep/in/the/heart/of/texas.dump"
  • The request.pathInfo is null

@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

Sorry, @joakime, the pattern I quoted was supposed to read: ^(.*/services/data[/|\.]?)(.*)$

I agree the leading .* could be omitted - it was supposed to more closely match the original pattern in our custom regex support, which used to look like /services/data(/|\.)*, but the implementation was using Matcher.find instead of Matcher.match, for some obscure historical reasons

A regex pattern starting with .* is a suffix based pattern, similar to url-pattern of *.do, and as such has no ability to find the separation from Servlet Path to PathInfo.
That's how RegexPathSpec was written.

If we support the named groups, then you can override this behavior, by specifying the groups you want to be Servlet Path vs PathInfo.

@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

We could also put the java.util.regex.MatchResult from the regex match into the request attributes, which would allow you to pull any information from the RegexPathSpec match for your own purposes.

@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

Here's the example of your pattern but with named groups.

https://regex101.com/r/7O6M7Q/1

These named groups would allow you to specify where the important details separation should be.
Your data[/|\.]? would be matching on the character / or . 1 or 0 times.
This means your Servlet Name would contain the trailing slash or dot, perhaps you meant for that part of the match to be in the pathInfo section? (aka https://regex101.com/r/G1gmfw/1 )
Btw, you don't need the escape on the \. in the character list definition, the dot character is always a literal within the character list.

@jluehe
Copy link
Contributor

jluehe commented Apr 19, 2022

Thx, @joakime! I've compared with our original pattern and ran related functional tests, and as it turns out, the /or . is indeed expected to belong to the servletPath portion.

Still not sure what extra benefit the named groups would give us ...

Re: Btw, you don't need the escape on the . in the character list definition, the dot character is always a literal within the character list.

+1

@joakime
Copy link
Contributor

joakime commented Apr 19, 2022

Including the slash on the request.getServletPath() is an odd choice.

If we look at the normal url-pattern for prefix based matches on the Servlet spec, the separation works like this.

package jetty.pathspec;

import java.io.IOException;
import java.io.PrintWriter;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.servlet.ServletContextHandler;
import org.eclipse.jetty.servlet.ServletHolder;
import org.eclipse.jetty.util.component.LifeCycle;

public class PrefixBehaviorDemo
{
    public static void main(String[] args) throws Exception
    {
        Server server = new Server(9999);

        ServletContextHandler contextHandler = new ServletContextHandler();
        contextHandler.setContextPath("/ctx");
        ServletHolder dumpHolder = new ServletHolder("dump", new HttpServlet()
        {
            @Override
            protected void service(HttpServletRequest req, HttpServletResponse resp) throws IOException
            {
                resp.setCharacterEncoding("utf-8");
                resp.setContentType("text/plain");
                PrintWriter out = resp.getWriter();
                out.printf("Request URI: %s%n", req.getRequestURI());
                out.printf("Request Context Path: %s%n", req.getContextPath());
                out.printf("Request Servlet Path: %s%n", req.getServletPath());
                out.printf("Request Servlet PathInfo: %s%n", req.getPathInfo());
            }
        });
        contextHandler.addServlet(dumpHolder, "/dump/*");
    
        server.setHandler(contextHandler);
        try
        {
            server.start();
            HttpClient httpClient = HttpClient.newHttpClient();

            demoRequest(httpClient, server.getURI().resolve("/ctx/dump/this"));
            demoRequest(httpClient, server.getURI().resolve("/ctx/dump/deep/in/the/heart/of/texas"));
        }
        finally
        {
            LifeCycle.stop(server);
        }
    }

    private static void demoRequest(HttpClient httpClient, URI path)
    {
        try
        {
            HttpRequest httpRequest = HttpRequest.newBuilder(path).GET().build();
            HttpResponse<String> httpResponse = httpClient.send(httpRequest, HttpResponse.BodyHandlers.ofString());
            System.out.printf("HTTP %s %s Response Status: %d%n", httpRequest.method(), httpRequest.uri(), httpResponse.statusCode());
            System.out.println(httpResponse.body());
        }
        catch (IOException | InterruptedException e)
        {
            e.printStackTrace();
        }
    }
}

A setup of ...

  • Context Path at /ctx
  • Servlet "dump" at url-pattern of /dump/*

Request URI: /ctx/dump/this

  • Request Context Path: "/ctx"
  • Request Servlet Path: "/dump"
  • Request Servlet PathInfo: "/this"

Request URI: /ctx/dump/deep/in/the/heart/of/texas

  • Request Context Path: "/ctx"
  • Request Servlet Path: "/dump"
  • Request Servlet PathInfo: "/deep/in/the/heart/of/texas"

I think it would make more sense for that slash (or dot) to belong to the pathInfo portion.

@jluehe
Copy link
Contributor

jluehe commented Apr 19, 2022

@joakime, I agree that would make more sense, and we can revisit later - for now, I just want to get the existing patterns working with Jetty 10 ...

Jetty 12.0.ALPHA1 automation moved this from To do to Done Apr 25, 2022
Jetty 10.0.10/11.0.10 - 🧊 FROZEN 🥶 automation moved this from To do to Done Apr 25, 2022
gregw added a commit that referenced this issue Apr 25, 2022
Fix 7891 regex pathInfo

+ Use the pathSpec methods to set servletPath and pathInfo when possible

Signed-off-by: Greg Wilkins <gregw@webtide.com>
gregw added a commit that referenced this issue Apr 25, 2022
Fix 7891 regex pathInfo

+ Use the pathSpec methods to set servletPath and pathInfo when possible

Signed-off-by: Greg Wilkins <gregw@webtide.com>
@gregw
Copy link
Contributor Author

gregw commented Apr 25, 2022

merged to 11, cherry-picked to 12

@jluehe
Copy link
Contributor

jluehe commented Apr 25, 2022

@gregw @joakime, I've tried out your patch efd9f26, and it does not seem to work for our use cases.

Example:
pathSpec=RegexPathSpec@df111421{^(/[A-Za-z0-9]{3})(/.)?$}
pathInContext=/0XA/e

Expectation:
_servletPath=/0XA
_pathInfo=/e

Actual result with patch:
_servletPath=/0XA/e
_pathInfo=null

This goes back to my earlier comment when I first tried out the patch ...

@joakime
Copy link
Contributor

joakime commented Apr 25, 2022

The current jetty-10.0.x HEAD has this fix currently.

Jetty 12.0.ALPHA1 automation moved this from Done to In progress Apr 25, 2022
Jetty 10.0.10/11.0.10 - 🧊 FROZEN 🥶 automation moved this from Done to In progress Apr 25, 2022
@joakime
Copy link
Contributor

joakime commented Apr 25, 2022

The regex pattern ^(/[A-Za-z0-9]{3})(/.)?$ of is identified as a signature of gl (glob + literal).
That's a SUFFIX_GLOB type of PathSpec mapping.
This is a glob followed by a suffix. (in url-pattern terms, that's the equivalent of *.do)

This categorization is important when resolving across all of the PathSpec's you have registered.
The lookup order across all defined PathSpec's is ...

  1. PathSpecGroup.ordinal (increasing)
    a. ROOT (only "" or null)
    b. EXACT (no glob, exact definitions. eg: /images or /css)
    c. MIDDLE_GLOB (glob in middle only. eg: /^downloads/[^/]*.zip$ or /a/{var}/c)
    d. PREFIX_GLOB (glob at end only. eg: /downloads/*, /api/*, ^/rest/.*$, /bookings/{guest-id}, ^/sku/[A-Z0-9]{9}$)
    e. SUFFIX_GLOB (glob at start only. eg: *.do, *.css, ^.*\.zip$)
    f. DEFAULT (only "/" or ^/$)
  2. PathSpec.specLength (decreasing)
  3. PathSpec.declaration (alphabetical)

The current implementation treats only PREFIX_GLOB as having the potential for pathInfo.

@jluehe
Copy link
Contributor

jluehe commented Apr 25, 2022

Thx, @joakime! Can the current implementation be enhanced to support our use case? I've enabled Jetty 10.0.9 in our dev branch, with my patch (hack) as mentioned above applied to it, and things are looking very promising so far! Alternatively, could the current implementation be changed to support custom pattern matching as shown in my hack?

@joakime
Copy link
Contributor

joakime commented Apr 25, 2022

I'm making some improvements to RegexPathSpec in regards to how often it does matching. (going from 3 per hit to 1 per hit)
Your changes from above are being evaluated.

@jluehe
Copy link
Contributor

jluehe commented Apr 25, 2022

Awesome, @joakime!

@joakime joakime moved this from In progress to Done in Jetty 10.0.10/11.0.10 - 🧊 FROZEN 🥶 Apr 27, 2022
@joakime joakime moved this from In progress to Done in Jetty 12.0.ALPHA1 Apr 27, 2022
@jluehe
Copy link
Contributor

jluehe commented Apr 29, 2022

Re: #7891 (comment), turns out the leading .* inside the servletPath group actually is intentional, to map something like /connect/communities/005x00AA/services/data/v56.0/connect/file

joakime added a commit that referenced this issue May 2, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 3, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 3, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 3, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
@joakime joakime linked a pull request May 3, 2022 that will close this issue
joakime added a commit that referenced this issue May 3, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 4, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 4, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 4, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
@joakime joakime changed the title Better ServletPathMapping for Regex Better Servlet PathMappings for Regex May 4, 2022
joakime added a commit that referenced this issue May 11, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 11, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 11, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 11, 2022
Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime added a commit that referenced this issue May 16, 2022
+ Implemented recalculations
+ Added more comments

Signed-off-by: Joakim Erdfelt <joakim.erdfelt@gmail.com>
joakime pushed a commit that referenced this issue Oct 6, 2022
Fix 7891 regex pathInfo

+ Use the pathSpec methods to set servletPath and pathInfo when possible

Signed-off-by: Greg Wilkins <gregw@webtide.com>
@github-actions
Copy link

github-actions bot commented May 5, 2023

This issue has been automatically marked as stale because it has been a
full year without activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the Stale For auto-closed stale issues and pull requests label May 5, 2023
@joakime
Copy link
Contributor

joakime commented May 5, 2023

Closing, merged in PR #7947

@joakime joakime closed this as completed May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side Sponsored This issue affects a user with a commercial support agreement Stale For auto-closed stale issues and pull requests
Projects
No open projects
3 participants