Add extractor for user language #2198

frenetisch-applaudierend · 2023-08-28T18:44:28Z

This is a draft implementation for the feature request #2197 (Adding a UserLanguage extractor).

Before adding tests and documentation, I wanted to make sure that this is the right direction for this feature.

Include example showing basic usage

frenetisch-applaudierend · 2023-08-28T18:44:49Z

A few specific questions that popped up:

Should this be feature gated? It adds no additional dependencies, but I'm not sure exactly what the policy is.

Is the module layout ok? I tried to mimic what I saw in other parts, with creating a top-level module and re-export just the extractor from the extractor module.

I was thinking about the type for the user language code. Right now it is a String, since that's what's usually provided from the request and it does not constrain the supported values. However this means a user of the UserLanguage extractor has to care about handling all possible cases of upper/lowercase strings and locales with country ids (en-US vs en).

We could use unic_langid::LanguageIdentifier which would make this easier, but this means that only valid language identifiers would be supported (e.g. lang=German would not be supported anymore). Additionally this adds a new dependency which may not be desirable.

Another approach could be to make the extractor UserLanguage<T>, where T: TryFrom<String> with a default of String. This would allow users to provide their own type like LanguageIdentifier or a custom enum type with the supported languages. Though I'm not sure if it is worth the added complexity.

frenetisch-applaudierend · 2023-09-26T10:15:45Z

Just wanted to check in on this. Do you have any comments on this, or is it fine for now and I should just continue?

davidpdrsn · 2023-09-26T11:36:20Z

I haven't had time to look at it yet :/ I'd say you're free to continue. You can always ship it in its own crate if we decide it's not appropriate for axum. I think its an extractor that makes sense to have at least in some library somewhere.

frenetisch-applaudierend · 2023-09-26T12:58:25Z

Sure, no worries, thanks for the feedback. I'll continue then to try and get it ready 👍

frenetisch-applaudierend · 2023-09-29T09:28:59Z

I added some tests and documentation, as well as tried to be more consistent with other extractors.

JonahKr

Hey there @frenetisch-applaudierend 👋
Considering the structure of this Pull Request and the already existing Extractors I wanted to raise a few questions:
First of all the Header Source in my opinion makes sense.
You might additionally be able to contribute your work on the header to the headers library (https://github.com/hyperium/headers) so it can be extracted using the TypedHeader as well.

For the rest, I want to give some examples.

let router = Router::new().route("/post/:lang", get(posts));

// Path extractor
async fn posts(Path(lang): Path<String> {
// validate language and adjust query or smth
}

// Query Extractor
async fn posts(query: Query<SomeStruct>) {
let lang = query.x;
// validate language and adjust query or smth
}

//Proposed Extractor
async fn posts(lang: Extension<UserLanguage>) {
// validate language and adjust query or smth
}

The current benefit of your proposal is that you can add an Extension which can be added to every handler in the attached router and receive a standardized language struct following predefined sources.
While for the header this makes total sense with a standardized format and the quality value syntax, the added value of the other two sources is only limited since it doesn't alleviate any handling for the end user. The languages still need to be validated for them to be usable anyway, so why not just use what's already there?
I get what you are trying to achieve but in my opinion, the current design with the configuration and extension abstraction is either too much for what is currently implemented or might need some further development for the validation of languages following the IETF BCP 47 language tags or similar, at which point it might make sense to move this to an external crate altogether.

Let me know what you think!

frenetisch-applaudierend · 2024-01-07T20:26:37Z

Hi @JonahKr

Thank you for your comments! I'll try to address then below.

First of all the Header Source in my opinion makes sense. You might additionally be able to contribute your work on the header to the headers library (https://github.com/hyperium/headers) so it can be extracted using the TypedHeader as well.

Good point, I'll look into that! That would also make the code here simpler, as some general logic can be extracted.

//Proposed Extractor
async fn posts(lang: Extension<UserLanguage>) {
// validate language and adjust query or smth
}

You might have misread the example a little here. Extension is only needed to configure the extractor, not to use it (and configuration is optional if the defaults fit your use case). So the handler would match the others as well:

// Proposed Extractor
async fn posts(lang: UserLanguage) {
    // Read preferred language and adjust query or smth
}

The current benefit of your proposal is that you can add an Extension which can be added to every handler in the attached router and receive a standardized language struct following predefined sources.

In my opinion, the main benefits of this proposal are readability and flexibility. Standardization in the current form is not (as you have noted).

Readability, because it should be immediately clear when seeing a UserLanguage extractor what the intention is. Comparatively, using a cookie extractor and maybe a header extractor and feeding them into a custom function needs more effort to understand what is going on.

Flexibility, because if you want to change where you read your user language from, it can be done from a single place. When using the existing extractors directly in your handlers you would have to change each handler.

Of course you can use the existing extractors to build your own extractor, which has the same benefits. But the premise here is that this need is universal enough to warrant it being part of the ecosystem "out of the box".

While for the header this makes total sense with a standardized format and the quality value syntax, the added value of the other two sources is only limited since it doesn't alleviate any handling for the end user. The languages still need to be validated for them to be usable anyway, so why not just use what's already there? I get what you are trying to achieve but in my opinion, the current design with the configuration and extension abstraction is either too much for what is currently implemented or might need some further development for the validation of languages following the IETF BCP 47 language tags or similar, at which point it might make sense to move this to an external crate altogether.

I think there are two different tasks here:

Getting a list of preferred languages for the given request
Making sense of this list and decide what to do with this information

This extractor only handles the first task, and the abstractions are built around this task. There is one extractor, one trait for the language source and one configuration to hold sources (plus some provided sources that in my opinion are universal enough to be shipped along). I don't feel this is going overboard, but maybe you could elaborate what you would see as an acceptable amount of abstraction, just for this task?

The second task is subject to the application itself. Even if you know that you get language tags as from the Accept-Language header (like de or de-DE), it's still not trivial to then map this over your supported languages. You would have to handle cases where you have to decide whether de should match de-CH, or de-AT should match de-DE etc. This logic is out of scope of this extractor in my opinion, especially considering that the translation tool of your choice might already implement that for you. You might also decide that for your application only having d and e as supported language ids is sufficient, and I feel that should be a supported use case as well.

I agree that it would be helpful if the language extracted would match some known specification. But as you mention, this would likely mean more complexity than is warranted. In the end I think retrieving the list of languages is enough beneficial to exist on its own.

Thanks again, and I hope I could address at least some of your questions and concerns?

frenetisch-applaudierend added 6 commits August 27, 2023 18:52

Add api for UserLanguage

4577e7e

Include example showing basic usage

Add a first implementation to read user language

2eb08a1

Extract user lang sources

ca79cfb

Add possibility to configure

e2c538b

Improve UserLanguage example a bit

55232bd

Reorganize user_lang modules

d4ce66b

frenetisch-applaudierend marked this pull request as draft August 28, 2023 18:45

frenetisch-applaudierend added 11 commits September 28, 2023 19:00

Move modules to be more consistent with the rest

c40f4bf

Add documentation to the lang module

4e75f78

Add documentation for the config module

d659285

Rename UserLanguageConfig/Builder

2ccc8ac

Add docs for UserLanguageSource

02b1f96

Add docs for PathSource

b640fa8

Add docs for query and accept header sources

0004c64

Add some tests for user language extractor

0d68fff

Add tests for query and path source

a89879a

Add test for header source

a1fe1fd

Remove unintentionally checked-in files

f247202

frenetisch-applaudierend marked this pull request as ready for review September 29, 2023 09:26

Ignore wildcard language in accept source

49bcd59

JonahKr reviewed Jan 3, 2024

View reviewed changes

frenetisch-applaudierend added 3 commits January 7, 2024 21:35

Merge upstream

3cafa1d

Fix CI issues

cfed2b4

Replace usages of (&str).to_string() with to_owned()

fc60aa6

frenetisch-applaudierend added 4 commits January 7, 2024 22:02

Use query source conditionally based on feature flag

604c25e

Address more clippy issues

b51c672

Use crate export insted of re-export

c596e0d

Fix clippy hint in example

cdf9a72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extractor for user language #2198

Add extractor for user language #2198

frenetisch-applaudierend commented Aug 28, 2023 •

edited

frenetisch-applaudierend commented Aug 28, 2023 •

edited

frenetisch-applaudierend commented Sep 26, 2023

davidpdrsn commented Sep 26, 2023

frenetisch-applaudierend commented Sep 26, 2023

frenetisch-applaudierend commented Sep 29, 2023

JonahKr left a comment

frenetisch-applaudierend commented Jan 7, 2024

Add extractor for user language #2198

Are you sure you want to change the base?

Add extractor for user language #2198

Conversation

frenetisch-applaudierend commented Aug 28, 2023 • edited

frenetisch-applaudierend commented Aug 28, 2023 • edited

frenetisch-applaudierend commented Sep 26, 2023

davidpdrsn commented Sep 26, 2023

frenetisch-applaudierend commented Sep 26, 2023

frenetisch-applaudierend commented Sep 29, 2023

JonahKr left a comment

Choose a reason for hiding this comment

frenetisch-applaudierend commented Jan 7, 2024

frenetisch-applaudierend commented Aug 28, 2023 •

edited

frenetisch-applaudierend commented Aug 28, 2023 •

edited