Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not SEO-friendly for mandarin(zh) and hindi(hi) #12

Open
sakulstra opened this issue Dec 28, 2019 · 19 comments
Open

Not SEO-friendly for mandarin(zh) and hindi(hi) #12

sakulstra opened this issue Dec 28, 2019 · 19 comments

Comments

@sakulstra
Copy link

console.log(slug("鳄梨"))
""

console.log(slug("एवोकाडो"))
""
@sakulstra
Copy link
Author

while for hindi we could just extend the current map with https://github.com/cocur/slugify/blob/master/Resources/rules/hindi.json

I'm wondering if adding chinese pinyin https://github.com/cocur/slugify/blob/master/Resources/rules/chinese.json replacements would violate the scope of this package (as it's 100kb on it's own).

@Trott
Copy link
Owner

Trott commented Dec 28, 2019

Maybe we can essentially solve all character sets at once by falling back to what https://github.com/npm/unique-slug does: create a hash for values that return an empty string on the existing algorithm.

@Trott
Copy link
Owner

Trott commented Dec 28, 2019

Maybe we can essentially solve all character sets at once by falling back to what https://github.com/npm/unique-slug does: create a hash for values that return an empty string on the existing algorithm.

Oh, right, but that goes against the minimalist dependencies philosophy. But perhaps fall back to some other deterministic algorithm.

@Trott
Copy link
Owner

Trott commented Dec 28, 2019

Base64 as a fallback is probably not ideal, but certainly better than nothing. Might go that route.

@Trott
Copy link
Owner

Trott commented Dec 28, 2019

Hopefully this would work? #13

@sakulstra
Copy link
Author

sakulstra commented Dec 28, 2019

@Trott I think that's a good idea and might be what the majority of users would prefer? Idk.

For me personally it wouldn't solve the problem though 😅

Let me add some context:
I'm currently using node-slug to generate seo friendly/readable urls in all sorts of languages on the serverside.

Apparently (at least)baidu is capable of interpreting pinyin, which is why for me pinyin slugs would be better suited than the proposed solution of hashed slugs. Generally speaking when using this on the serverside to generate "pretty urls", I guess you always would prefer char mappings over hashed words.


As I get it (sorry missed it before I opened this issue), I could just manually extend the charmap with the json files i linked in the initial message and I should be good to go.

It might be a good idea to ship this mappings with the repo?

import zh from "slug/zh"

slug.charmap.push(...zh);

This way users could decide that they don't need to support arabic, but might want to support hindi

@Trott
Copy link
Owner

Trott commented Dec 28, 2019

I guess as a temporary solution, we can go with that pull request, and you can still provide your own charmap to solve your use case?

I like the idea of dynamically loading the needed charmaps, maybe with a default of loading all the ones that wouldn't be enormous, so no one has to change their code, but people who want to enable zh charmapping can do so easily.

@sakulstra
Copy link
Author

sorry for the delay, yep your pr sounds fine, but should probably marked as breaking as implementations doing custom stuff based on slug(sth) returning an empty string will break.

I like the idea of dynamically loading the needed charmaps, maybe with a default of loading all the ones that wouldn't be enormous, so no one has to change their code, but people who want to enable zh charmapping can do so easily.

I'll try to find some time on one of the coming weekends to work on a pr/proposal

@Trott Trott changed the title Doesn't support mandarin(zh) and hindi(hi) Not SEO-friendly for mandarin(zh) and hindi(hi) Apr 28, 2020
@atulmy
Copy link

atulmy commented Apr 30, 2020

Thanks for the library. Would be great if it can support languages like Hindi, etc.

@Trott
Copy link
Owner

Trott commented May 10, 2020

I'm rethinking the approach here and thinking it makes sense to support Hindi characters (and a number of others) by default. I expect to have something out this month. Stay tuned....

@Trott
Copy link
Owner

Trott commented May 18, 2020

Forgive my ignorance, but I'm having trouble using https://github.com/cocur/slugify/blob/master/Resources/rules/hindi.json as a reference. It's not aligning with what I"m seeing in https://en.wikipedia.org/wiki/Devanagari_transliteration. For example, the former transliterates फ़ as Fi but the latter suggests fa. Would I be correct to guess that the latter is more standard?

@Trott
Copy link
Owner

Trott commented May 18, 2020

I think I may be getting thrown off by the underdot and multibyte characters.

@Trott
Copy link
Owner

Trott commented May 18, 2020

OK, I think I got Hindi support working acceptably in #55. Does this seem correct or at least not entirely wrong?

console.log(slug("एवोकाडो"))
// evakada

@sakulstra
Copy link
Author

ahrg, sorry I completely forgot about this as i stopped working on that private project which utilized it 😅

https://gist.github.com/sakulstra/02b391dccfb6896047c5bc0b89aca41d here are the hindi&mandarin files in case this is of any help.

@Trott
Copy link
Owner

Trott commented May 18, 2020

Hindi is now supported out-of-the-box on master branch and will be in 3.0.0. Mandarin is (obviously) more challenging since it would add thousands of characters. Still thinking about how best to enable easy opt-in for such things.

@Trott
Copy link
Owner

Trott commented May 20, 2020

I just published 3.0.0, so if you upgrade, you'll have Hindi support.

@whoafridi
Copy link

any plan for Bengali ( বাংলা ) language ? or what is the procedure to update it with বাংলা language? @Trott

@Trott
Copy link
Owner

Trott commented Jan 28, 2024

any plan for Bengali ( বাংলা ) language ? or what is the procedure to update it with বাংলা language? @Trott

If I understand correctly (and I might not!), adding Bengali would (given the current slug algorithm at least) require adding hundreds of characters to multicharmap. I would expect that to have a performance impact. Making such a thing opt-in would be a good thing, I think. But that kind of plugin-module opt-in feature is something that is in the (stalled) beta branch.

So, I'd say no plans any time soon, but if someone wanted to do the work to make it possible, I wouldn't discourage it.

@whoafridi
Copy link

@Trott okay . great

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants