Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i18n #1096

Open
jannesiera opened this issue Nov 15, 2021 · 15 comments
Open

i18n #1096

jannesiera opened this issue Nov 15, 2021 · 15 comments

Comments

@jannesiera
Copy link

It has been mentioned before briefly in another issue but then it seems it wasn't discussed any further.
#1095 (comment)

My use case for internationalization is pretty much the same. Is this something we can achieve with the current version of Charge? If not, what would need to happen to make this possible?

@brandonweiss
Copy link
Owner

My gut says it’s almost certainly possible to make an i18n site with Charge as-is, but that doesn’t necessarily mean it’d be easy to do it 😅

Can you start by explaining exactly how you’d like to see i18n work? URL structure, string/key storage, string/key consumption, etc. Example code would be especially helpful!

From there I can help explain how you might do i18n with Charge as-is, as well as figure out what conveniences it might make sense to add.

Thanks!

@jannesiera
Copy link
Author

The folder structure should basically be similar to what was mentioned in the original ticket.

E.g. given two locales "en" and "fr"

index.html
en/index.html
fr/index.html

For the translations themselves, I'm not looking for too much help. What matters most is that I can output my components in those folders with certain settings. For now I have translations by key value pair in a json file, and have a settings json file with the 'current' language. A simple function will then look up the correct translation by a key and current language.

I have a few utility scripts that will update the settings json with a certain language and I run them before I serve/build charge. This is a current workaround so I can at least output the site in a certain language, but I'd have to run the build for every language individually. Additionally, I'd be missing the correct folder structure (I'd have to move the files by hand in the correct folder before deploying) which breaks links, etc while I'm serving in development mode. I also can't change language on the fly when serving since I can only serve one language at a time.

All of this could be solved if I could (1) output multiple files from a single page and (2) control where it would output those. The documentation mentions 'dynamic pages' but doesn't say what those while stating they are 'coming soon'.

Hope you can help me out here, the current setup kind of works but really isn't ideal for development purposes.

@brandonweiss
Copy link
Owner

Alright, let’s see how far we can get with things as-is. I’ll caveat that I’m winging this here a bit—I haven’t actually tried any of this.

The first problem to solve is how to share pages. We obviously don’t want to make index.html and then have to completely duplicate the markup to make fr/index.html. Let’s say we have a data file like this:

// data/translations.json
{
  en: {
    greeting: “Hello"
  }
  fr: {
    greeting: “Bonjour"
  }
}

Then we could make a page like this.

// index.html.jsx

export default ({ language: "en", data }) => {
  return <p>{data.translations[language].greeting}</p>
}

I believe that should work 🤞🏼 And that might also allow us to do this.

// fr/index.html

import Index from “./index.html.jsx"

export default ({ data }) => {
  return <Index language="fr" data={data} />
}

Give that a try and tell me if that works or not! I realize it’s not super elegant, but it’s a start. You could improve it by extracting a function to make the translation lookup a little more seamless, something like translate("greeting", language).

The next step might be to codify this somehow. Like maybe if there is a translations file in the data directory, it could look at the top-level keys (en, fr, et cetera) and automatically create the permutations of the pages. Although to do that would require enforcing a specific URL structure, and I honestly don’t know if that makes sense because I don’t have a lot of experience with i18n. Is putting the language in the path like this the best? What about subdomains? Or something else?

@brandonweiss
Copy link
Owner

Actually I just realized that because you’d be importing the first page, that page becomes a component, and won’t be outputted, so you’d only wind up with the en/index.html and fr/index.html. That said, I think perhaps that makes more sense. I can’t quite imagine a scenario where you’d want both index.html and en/index.html to both exist. Search engines frown on duplicate content, so you’d probably have to set a canonical meta tag to avoid being penalized. Maybe there’s a use-case I’m missing, but it seems like you’d want to create a redirect rule in whatever host you’re using to redirect URLs without a language to the same URL but with whatever the default language is.

@jannesiera
Copy link
Author

Thanks for getting back to me.

This was the first approach I considered, but it causes a proliferation of files. Now for every page, I need to create another file for every language with some meaningless boilerplate. This means that, for my initial example of two languages, if I have 20 pages I end up with 60 files. That's really painful when 40 of those pages don't do anything. Especially when something changes (since you'd have to keep everything in sync).

The translations themselves are not an issue, I have a similar approach with a translations json and that works fine. It's automatically generating the correct permutations in the correct folder without the need to create these pages manually which is an issue. To be honest, there is nothing in the API that I saw that could make this possible. Only more control over how and where pages are outputted seems to fix the issue elegantly.

From what I've seen putting the language in a path is a pretty decent way to go, and that's what I'm choosing to do with my own site currently. Subdomains could be an alternative but they have their own pros and cons. For a simple static site path seems the best solution.

@brandonweiss
Copy link
Owner

Yeah, that’s fair. Okay let’s see how this could potentially work.

Let’s say there’s a canonical place we put translations, sort of like I mentioned in a previous comment.

// data/translations.json
{
  en: {
    greeting: “Hello"
  }
  fr: {
    greeting: “Bonjour"
  }
}

We could extract the language keys from this in order to determine what languages are available. This does seem to be the most common way to handle translations, although I’ve always found this structure a bit annoying. To me it makes much more sense to group around the translation key, like this:

// data/translations.json
{
  greeting: {
    en: "Hello",
    fr: "Bonjour"
  }
}

That way it’s much easier to add/remove translation keys. But then it’s harder to know exactly what languages are available. You’d have to randomly grab the first translation key and then extract the language keys from it. That feels sort of inelegant, but it technically works.

From there, every page could be generated with a language prefix (e.g. en/index.html). I think we’d have to not generate a “default” language (e.g. index.html), because there wouldn’t really be a good way to specify a default language. I mean we could technically infer it by which language key is first, but that feels really un-obvious and confusing. As I mentioned earlier, I also can’t really see the use-case for generating the same content at two different paths. What do you think?

To get the translations we could automatically pass in a language prop to the page, that would then let you lookup the correct translation in the data file, like I demonstrated earlier.

Although alternatively we could potentially do something more clever… we could pass in a translations prop that is already scoped to the language for that page, so you could just do {{ translations.greeting }}. That feels pretty nice.

What do you think?

@jannesiera
Copy link
Author

That would work, but it doesn't allow for much flexibility. I was thinking something like this:

Allow to optionally export a 'basePath' property on the exported metadata in a file. By default, this value is an empty path (''). However, whenever this value is provided ChargeJs will output this file under the given basePath. Optionally, an array of basePaths can be given. When this is the case ChargeJs will output the file multiple times, each time under a different base path. When ChargeJs generates the file, the current basePath will be provided on the props object.

Some examples, given a file structure in the source folder:

index.html.jsx
portfolio.html.jsx
blog/post1.html.jsx
blog/post2.html.jsx

When basePath is '' the output will have the same structure in the target folder:

index.html.jsx
portfolio.html.jsx
blog/post1.html.jsx
blog/post2.html.jsx

Now, let's assume the basePath is set (in every file's metadata!) to 'dist' the output in the target folder will look like this:

dist/index.html.jsx
dist/portfolio.html.jsx
dist/blog/post1.html.jsx
dist/blog/post2.html.jsx

Or, let's assume the basePath is set (in every file's metadata!) to [ 'en', 'fr' ] the output in the target folder will look like this:

en/index.html.jsx
en/portfolio.html.jsx
en/blog/post1.html.jsx
en/blog/post2.html.jsx
fr/index.html.jsx
fr/portfolio.html.jsx
fr/blog/post1.html.jsx
fr/blog/post2.html.jsx

Or, let's assume the basePath is set (in every file's metadata!) to [ '', 'en', 'fr' ] the output in the target folder will look like this:

index.html.jsx
portfolio.html.jsx
blog/post1.html.jsx
blog/post2.html.jsx
en/index.html.jsx
en/portfolio.html.jsx
en/blog/post1.html.jsx
en/blog/post2.html.jsx
fr/index.html.jsx
fr/portfolio.html.jsx
fr/blog/post1.html.jsx
fr/blog/post2.html.jsx

You could also mix and match, let's say we only have our blog in English, but we want to show the index and portfolio pages in multiple languages, we could set the metadata languages in every post file to just 'en' while for the portfolio and index pages we set it to [ 'en', 'fr' ']. Then we get the following output:

en/index.html.jsx
en/portfolio.html.jsx
en/blog/post1.html.jsx
en/blog/post2.html
fr/index.html.jsx
fr/portfolio.html.jsx

Given this feature creating translated pages becomes trivial. We have even more control over how and where we want to pull translations from. (e.g. you want to define some translations in markdown, or you manage them in an external tool).

Is this something you think could work?

@brandonweiss
Copy link
Owner

I think I sort of understand what you’re getting at, although I’m not entirely sure how it would work? Can you give me an example of how basePath would be set?

@jannesiera
Copy link
Author

Under the section "meta" (https://charge.js.org/pages) of the docs it mentions exporting a meta object from a page. So I would suggest doing something like this:

export const meta = {
  basePath: [ 'en', 'fr' ],
  // Optional other data
  date: "2019/01/28",
  title: "First post!"
}

@brandonweiss
Copy link
Owner

brandonweiss commented Nov 20, 2021

Oh, okay, so the meta export is really only for MDX (Markdown) pages because the nature of how Markdown works there’s no other way to communicate relevant information up to a layout higher in the tree.

I guess I should have called out in my explanation that I think Markdown is something different that doesn’t seem to fall under normal i18n usage. Like you wouldn’t put that prose content in a translations file, right? You would just duplicate the page and write the prose in a different language. Which is something you can do now without any i18n-specific additions. The URL structure might be something like this:

/ (index)
/archive (archive)
posts/title (post in default language)
posts/title-de (post in German language)

To me I think where the i18n stuff comes in is in making it possible for the index and the archive pages to be in multiple languages. So then the question becomes, is there any value in allowing those pages to be translated on a per-page basis. Like how common or useful would it be for the index page to be in English, French, and German, but the archive to only be in English? I’m having a hard time imagining a scenario where that would be desirable? It seems like if you’re going to go to the trouble to translate some of your strings into a different language, you’re probably going to go to the trouble to translate all your strings into a different language.

What do you think?

@jannesiera
Copy link
Author

For the record, I'm currently not using any MDX pages. It wasn't clear to me that the meta section in the docs only applied to MDX.

It seems like if you’re going to go to the trouble to translate some of your strings into a different language, you’re probably going to go to the trouble to translate all your strings into a different language.

While I personally don't have a need for it at this time, one possible use case could be only making a blog available for a certain language or even having a different blog content for different languages (as you might be working with more localized content creators).

I'm mostly trying to think a little bit further than just i18n. I like that ChargeJs is simple and doesn't need configuration to get started. But I'm definitely running into an issue here because I don't have control over where pages get outputted, or dynamically outputting a page multiple times with different parameters. My proposal has 2 benefits:

  1. You can use basePath for more use cases than just i18n. There might be other scenarios where you need more control over where and how many times you want to output a file.
  2. It is independent of what actual tools you want to use to do i18n. It's very simple for me to create a JSON with translation strings and write a function that pulls the correct translation. Someone else might want to do it a little differently though or pull translation data from somewhere else completely.

That said, calling it 'basePath' might not be powerful enough for some cases. I'm trying to think of the most general/powerful approach that is still simple conceptually. Maybe providing some kind of 'parameters' array which causes a page to be generated multiple times paired with a callback function of which basePath they should render to could work.

Let's assume we take the idea of meta and make it work with .jsx files. We could define something like this:

const meta = {
  runs: [
    { lang: 'en' },
    { lang: 'fr' }
  ],
  path: (path, name, run) => run.lang + '/' + path + name
}

Runs would define how many times the page 'generation' gets executed. For each run, you can specify your own metadata. Path will determine where the file will be outputted. So in this case we create an additional base folder for the language and nest the path in there. In other use cases, we might want to just change the file name. Additionally, the run metadata will be provided on the props that the component receives as well.

Now I could create my own combinations. Let's take an example where I have a multilingual (English and French) documentation website where for some pages I have different code examples for the programming languages that my library supports. So I want permutations of my page where I generate two pages per language.

const meta = {
  runs: [
    { lang: 'en', programmingLang: 'js' },
    { lang: 'en', programmingLang: 'python' },
    { lang: 'fr', programmingLang: 'js' },
    { lang: 'fr', programmingLang: 'python' }
  ],
  path: (path, name, run) => run.lang + '/' + path + run.programmingLang + '/' + name
}

What's your take on this? Hope we can find something that's both simple to understand and use, but also powerful enough so that I don't need to turn to another static site generator.

@jannesiera
Copy link
Author

@brandonweiss have you had a chance to take a look at my proposal?

@brandonweiss
Copy link
Owner

@jannesiera Hey, sorry, I did take a quick look at it but didn’t have time to respond—I’ll try to get to it shortly!

@jannesiera
Copy link
Author

@brandonweiss happy new year! When do you think you could have another look at this?

@brandonweiss
Copy link
Owner

Hey! Thanks for your patience 🙏🏼 And I appreciate your thoughtful write-up!

Okay, so regarding blog posts—yeah, that’s a great example of where content might be unevenly translated. Especially if the translations are user-contributed, it’s unlikely someone is going to spend the time to translate every post into their language of choice. But that said, it’s also unlikely that any technical person would write a blog post in JSX/HTML. They’re almost certainly going to write it in MDX/Markdown, which is sort of outside the scope of most i18n systems because it’s almost entirely prose. If you wanted to translate a blog post like that you would just duplicate the file, change the name, and write it in the new language.

As for a generic/abstract way to generate multiple files and/or change the path… so conceptually I like the thinking. I like when simple abstractions are created that can be used in multiple ways to achieve different goals. That said, I do think there is sort of a line somewhere were something because almost too abstract. Like Charge is actually a good example because the entire reason I created was tools like Gatsby are almost incomprehensible because they tried to make a tool that could do literally anything. By adding some conventions and limitations, I’ve made it so Charge can’t do literally anything, but it can do most things, and it can do them (I think) much more simply that Gatsby or something like it can.

So to get more concrete, I look at the runs/path syntax and I think okay, there’s two ways I could conceivably use this:

  1. To do something that’s a bit of a one-off
  2. To do something that’s repetetive

Maybe a good example for #1 would be a landing page with a few different variations. I think in that case I’d probably extract the content into components and use those components in a few different pages. It feels like that already provides the needed flexibility? Being able to achieve the same thing in one file with the variations defined in the meta object seems like just a differnet way of achieving mostly the same thing.

A good example for #2 would be someting like i18n. If I were going to use that syntax to do i18n, I’d have to duplicate and adjust it in every JSX file. If I had to do that I’d think that feels a bit kludgey, and I’d probably want to extract that into some sort of function so I can keep things consistent. And at that point it starts to feel like something that is a system and might make sense to be codified inside Charge.

As for your example where you have multiple permutations, that feels almost like a combination of #1 and #2. Like is someone really going to generate separate pages for each programming language? That feels like a stretch. If I wanted code blocks to support multiple programming languages, I think a more common way to do that would be to make a code block component that takes a snippet for each language, and then there’d be some of tabbed interface that lets you switch between them.

So to loop back around, I think it definitely makes sense to add i18n functionality—that’s a very concrete thing that you need (and lots of other people need). But I’d rather approach a potentially more abstract solution separately. If someone comes along with a really clear use-case and there’s no good way to solve it without enabling that flexiblity then that’s definitely worth considering, but otherwise it feels a bit too theoretical at the moment.

Does that seem good? If so, do you want to have a go at an implementation? If not, I can take a stab. I don’t think it’ll be all that much work (famous last words 😅).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants