Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extractor for user language #2198

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

frenetisch-applaudierend
Copy link

@frenetisch-applaudierend frenetisch-applaudierend commented Aug 28, 2023

This is a draft implementation for the feature request #2197 (Adding a UserLanguage extractor).

Before adding tests and documentation, I wanted to make sure that this is the right direction for this feature.

@frenetisch-applaudierend
Copy link
Author

frenetisch-applaudierend commented Aug 28, 2023

A few specific questions that popped up:

Should this be feature gated? It adds no additional dependencies, but I'm not sure exactly what the policy is.

Is the module layout ok? I tried to mimic what I saw in other parts, with creating a top-level module and re-export just the extractor from the extractor module.

I was thinking about the type for the user language code. Right now it is a String, since that's what's usually provided from the request and it does not constrain the supported values. However this means a user of the UserLanguage extractor has to care about handling all possible cases of upper/lowercase strings and locales with country ids (en-US vs en).

We could use unic_langid::LanguageIdentifier which would make this easier, but this means that only valid language identifiers would be supported (e.g. lang=German would not be supported anymore). Additionally this adds a new dependency which may not be desirable.

Another approach could be to make the extractor UserLanguage<T>, where T: TryFrom<String> with a default of String. This would allow users to provide their own type like LanguageIdentifier or a custom enum type with the supported languages. Though I'm not sure if it is worth the added complexity.

@frenetisch-applaudierend frenetisch-applaudierend marked this pull request as draft August 28, 2023 18:45
@frenetisch-applaudierend
Copy link
Author

Just wanted to check in on this. Do you have any comments on this, or is it fine for now and I should just continue?

@davidpdrsn
Copy link
Member

I haven't had time to look at it yet :/ I'd say you're free to continue. You can always ship it in its own crate if we decide it's not appropriate for axum. I think its an extractor that makes sense to have at least in some library somewhere.

@frenetisch-applaudierend
Copy link
Author

Sure, no worries, thanks for the feedback. I'll continue then to try and get it ready 👍

@frenetisch-applaudierend frenetisch-applaudierend marked this pull request as ready for review September 29, 2023 09:26
@frenetisch-applaudierend
Copy link
Author

I added some tests and documentation, as well as tried to be more consistent with other extractors.

Copy link

@JonahKr JonahKr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there @frenetisch-applaudierend 👋
Considering the structure of this Pull Request and the already existing Extractors I wanted to raise a few questions:
First of all the Header Source in my opinion makes sense.
You might additionally be able to contribute your work on the header to the headers library (https://github.com/hyperium/headers) so it can be extracted using the TypedHeader as well.

For the rest, I want to give some examples.

let router = Router::new().route("/post/:lang", get(posts));

// Path extractor
async fn posts(Path(lang): Path<String> {
// validate language and adjust query or smth
}

// Query Extractor
async fn posts(query: Query<SomeStruct>) {
let lang = query.x;
// validate language and adjust query or smth
}

//Proposed Extractor
async fn posts(lang: Extension<UserLanguage>) {
// validate language and adjust query or smth
}

The current benefit of your proposal is that you can add an Extension which can be added to every handler in the attached router and receive a standardized language struct following predefined sources.
While for the header this makes total sense with a standardized format and the quality value syntax, the added value of the other two sources is only limited since it doesn't alleviate any handling for the end user. The languages still need to be validated for them to be usable anyway, so why not just use what's already there?
I get what you are trying to achieve but in my opinion, the current design with the configuration and extension abstraction is either too much for what is currently implemented or might need some further development for the validation of languages following the IETF BCP 47 language tags or similar, at which point it might make sense to move this to an external crate altogether.

Let me know what you think!

@frenetisch-applaudierend
Copy link
Author

Hi @JonahKr

Thank you for your comments! I'll try to address then below.

First of all the Header Source in my opinion makes sense. You might additionally be able to contribute your work on the header to the headers library (https://github.com/hyperium/headers) so it can be extracted using the TypedHeader as well.

Good point, I'll look into that! That would also make the code here simpler, as some general logic can be extracted.

//Proposed Extractor
async fn posts(lang: Extension<UserLanguage>) {
// validate language and adjust query or smth
}

You might have misread the example a little here. Extension is only needed to configure the extractor, not to use it (and configuration is optional if the defaults fit your use case). So the handler would match the others as well:

// Proposed Extractor
async fn posts(lang: UserLanguage) {
    // Read preferred language and adjust query or smth
}

The current benefit of your proposal is that you can add an Extension which can be added to every handler in the attached router and receive a standardized language struct following predefined sources.

In my opinion, the main benefits of this proposal are readability and flexibility. Standardization in the current form is not (as you have noted).

Readability, because it should be immediately clear when seeing a UserLanguage extractor what the intention is. Comparatively, using a cookie extractor and maybe a header extractor and feeding them into a custom function needs more effort to understand what is going on.

Flexibility, because if you want to change where you read your user language from, it can be done from a single place. When using the existing extractors directly in your handlers you would have to change each handler.

Of course you can use the existing extractors to build your own extractor, which has the same benefits. But the premise here is that this need is universal enough to warrant it being part of the ecosystem "out of the box".

While for the header this makes total sense with a standardized format and the quality value syntax, the added value of the other two sources is only limited since it doesn't alleviate any handling for the end user. The languages still need to be validated for them to be usable anyway, so why not just use what's already there? I get what you are trying to achieve but in my opinion, the current design with the configuration and extension abstraction is either too much for what is currently implemented or might need some further development for the validation of languages following the IETF BCP 47 language tags or similar, at which point it might make sense to move this to an external crate altogether.

I think there are two different tasks here:

  • Getting a list of preferred languages for the given request
  • Making sense of this list and decide what to do with this information

This extractor only handles the first task, and the abstractions are built around this task. There is one extractor, one trait for the language source and one configuration to hold sources (plus some provided sources that in my opinion are universal enough to be shipped along). I don't feel this is going overboard, but maybe you could elaborate what you would see as an acceptable amount of abstraction, just for this task?

The second task is subject to the application itself. Even if you know that you get language tags as from the Accept-Language header (like de or de-DE), it's still not trivial to then map this over your supported languages. You would have to handle cases where you have to decide whether de should match de-CH, or de-AT should match de-DE etc. This logic is out of scope of this extractor in my opinion, especially considering that the translation tool of your choice might already implement that for you. You might also decide that for your application only having d and e as supported language ids is sufficient, and I feel that should be a supported use case as well.

I agree that it would be helpful if the language extracted would match some known specification. But as you mention, this would likely mean more complexity than is warranted. In the end I think retrieving the list of languages is enough beneficial to exist on its own.

Thanks again, and I hope I could address at least some of your questions and concerns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants