Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow colons in routes again #1657

Closed

Conversation

bodograumann
Copy link
Contributor

This fixes the issue that was brought up as tiangolo/fastapi#4892
Due to starlette stripping any :.* ending when compiling path patterns, it became impossible to use colons in them.
This issue was introduced in #1322, because now the port suffix was stripped from host names. This is not necessary when compiling Route and Mount patterns however.

My use case, like the one of the original reporter, comes from following the google api design guide.

Copy link
Member

@aminalaee aminalaee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aminalaee
Copy link
Member

And also thank you @bodograumann for the PR 👍

@aminalaee aminalaee mentioned this pull request May 31, 2022
@Kludex Kludex added the hold Don't merge it label May 31, 2022
@Kludex Kludex requested a review from a team May 31, 2022 10:34
florimondmanca
florimondmanca previously approved these changes May 31, 2022
Copy link
Member

@florimondmanca florimondmanca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's a nice one. Brings the previous behavior by default while keeping the fix on host domains.

@Kludex Kludex removed the hold Don't merge it label May 31, 2022
@florimondmanca
Copy link
Member

florimondmanca commented May 31, 2022

Actually...

We should note that since compile_path is public, strip_port would de facto public API with this PR. Do we want that? I don't think so.

So, should we _strip_port instead? Should Host do its own thing completely separate from compile_path? Is there another way we can detect we're parsing a host pattern rather than a path pattern?

I'm thinking of the latter.

In fact, compile_path()'s docstring mentions that path should start with a /, but it doesn't have to. This is something we only check in Route.__init__(), before calling compile_path(). On the contrary, host domains typically shouldn't start with a /.

So, maybe the behavior could be this, without needing a new flag?

# Leading '/', treat as path, don't strip :rest
compile_path("/files:upload_homework/")  # -> /files:upload_homework/

# No leading '/', treat as host, strip :port
compile_path("{subdomain}.example.org:3000")  # -> {subdomain}.example.org

Essentially, we'd do:

is_host = not path.startswith("/")  # {subdomain}.example.org:port vs /users:list
tail = path[idx:].split(":")[0] if is_host else path[idx:]
path_regex += re.escape(tail) + "$"

Symetrically to Route checking for path starting with '/', this would require adding a check in Host that host does not start with a '/'.

@bodograumann
Copy link
Contributor Author

bodograumann commented May 31, 2022

You bring up some really good points. I also thought that it was weird to use the same method for Host and Route in the first place, but did not want change too much.
As long as my use case is covered in the end, I’m happy.
I guess I also wouldn’t expose compile_path publicly at all. Now it’s already out there, it is probably too late to take it back⁉

Should I try to make your suggested changes, do you want to do them yourself, or should we wait for this to be discussed a bit more?

@florimondmanca
Copy link
Member

Curious what @Kludex or @aminalaee think about ^^?

@aminalaee
Copy link
Member

I like the idea @florimondmanca
Worth explaining this in the docstring as it is doing a few things at the same time.

@Kludex
Copy link
Sponsor Member

Kludex commented Jun 1, 2022

Curious what @Kludex or @aminalaee think about ^^?

I didn't like that parameter as well, that's when I added the hold label. 😅

I prefer what you suggested, yep.

@bodograumann
Copy link
Contributor Author

I implemented it according to your suggestion @florimondmanca 🚀

@Kludex Kludex added the hold Don't merge it label Jun 1, 2022
@@ -108,34 +108,47 @@ def replace_params(


def compile_path(
path: str,
pattern: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, isn't this still considered backwards-incompatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. I guess technically you are correct. I didn't think about keyword argument usage.
We are also on a 0.x version. So we could do breaking changes in 0.21 I guess.
How about defining an alias for now. Then we can also get rid of the inprecise compile_path name internally.

E.g. rename into def compile_pattern(pattern: str) and def compile_path(path: str): return compile_pattern(path) for backwards compatibility.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid breaking changes as much as possible. I also don't see much benefit on the renaming here. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed adding a parameter is a breaking change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho the method should not have been used for compiling host patterns in the first place, because a hostname is not a path.
The least we can do now is apply proper naming and documentation.

The renaming would allow keeping backward-compatibility for the old interface, while implementing a better on. Personally I wouldn’t regard this function as part of the public starlette api currently, because it is not documented anywhere and I would keep it that way. So the new function name would be _compile_pattern I guess.

Having said all that, I don’t feel too strongly about any of this. Just thought it would be an improvement to the code base. If you don’t care about correct naming, I can simply revert that part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I think this has come up a few times before, but the priority is to keep backwards-compatibility until there's a major release like from 1.x.x to 2.x.x

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case when this would break is if people do compile_path(path=...).

It's possible to go for a deprecation, allowing this usage at runtime while keeping the new signature for type hints, with a decorator...

def _allow_deprecated_path_kwarg(func: Callable) -> Callable:
    @wraps(func)
    def wrapped(*args: Any, **kwargs: Any):
        if "path" in kwargs:
            ...  # Raise a deprecation warning
            return func(kwargs["path"])
        return func(*args, **kwargs)

    return wrapped

@_allow_deprecated_path_kwarg
def compile_path(pattern: str):
    ...

Edit: But not really convinced either this is something we should do, especially in this bugfix PR (see #1657 (comment)).

Copy link
Sponsor Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing this, I still don't see why we need to change path by pattern. 😅

"""
is_host: bool = not pattern.startswith("/")
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_host: bool = not pattern.startswith("/")
is_host = not pattern.startswith("/")


path_regex += re.escape(path[idx:].split(":")[0]) + "$"
path_format += path[idx:]
tail: str = pattern[idx:].split(":")[0] if is_host else pattern[idx:]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tail: str = pattern[idx:].split(":")[0] if is_host else pattern[idx:]
tail = pattern[idx:].split(":")[0] if is_host else pattern[idx:]

@florimondmanca
Copy link
Member

florimondmanca commented Jun 4, 2022

@Kludex From my POV, it is mainly a naming consistency. Kind of like HTTPX's "URLPattern", which was internal.

I'd personally be OK with keeping path for now, for the sake of focus.

We should note that actually the name of the function itself is compile_path. So if we accept the naming consistency concern, we should consider another deprecation route, which would be even more consistent: add compile_pattern() and make compile_path() call into it with a warning. This then begs the question, should compile_pattern be public, or private API? If private ("compile_path shouldn't have been public API anyway"), then what would be the way out for people currently relying on compile_path()? Etc.

This sounds like relevant discussion to have, but maybe not here, as this PR is about fixing colons in route names. So...?

@bodograumann WDYT?

@florimondmanca
Copy link
Member

Closing as we just merged #1675. Kudos to @bodograumann for helping out on this!

@ahopkins
Copy link
Member

ahopkins commented Sep 27, 2022

Am I missing something? By my read a colon is not a valid path character. I don't see any change to these in RFC updates to 3986.

https://www.rfc-editor.org/rfc/rfc3986#section-2.2

@bodograumann bodograumann deleted the fix-allow-colon-in-route branch September 27, 2022 20:40
@bodograumann
Copy link
Contributor Author

Starlette's path matching happens after the path has been decoded, so all characters are valid, they just have to be encoded when going over the wire, @ahopkins .
More specifically the spec says about reserved characters:

If data for a URI component would conflict with a reserved
character's purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.

From what I have seen, it is common practice to always encode the colon, even though it could theoretically be unambiguously interpreted in the path part of the URI.

@ahopkins
Copy link
Member

Got it. Yeah, agreed that encoded should be fine. Thanks for the clarification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold Don't merge it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants