Experimental project that converts French sentences to French sms style sentences in JavaScript.
It should lead to smaller sentences that are still readable, even though some vocabulary may be known by younger people only 😄
npm install french-to-sms
const frenchToSms = require('french-to-sms');
frenchToSms("coucou");
// => "cc"
frenchToSms("Bonjour tout le monde ! J'espère que vous allez bien ! Moi la patate !");
// => "bjr tt lmond ! jspr k vs allé b1 ! mwa la patate !"
frenchToSms("S'il vous plaît, pouvez-vous faire moins de bruit ? Merci.");
// => "svp, pouvé vs fR - 2 brui ? marci."
You can test the algorithm out on this demo page.
The algorithm behind this project is based upon a custom-made glossary.
It performs one by one the characters replacements defined in the glossary.
The glossary in its current state should enable a good quantity of french words and sentences to be shortened rather correctly. It was built from scratch by kind of reverse engineering the SMS French language and how it can be constructed.
The glossary is divided in five distinct replacement categories:
anywhere
: replacements contained in this category will be performed anywhere within the input text (Useful for general rules, eg: double consonants is often uselessapprends
=>aprends
)endOfWords
: replacements contained in this category will be performed only at the end of words (Useful for general rules at the end of words, eg: thee
in words ending withe
is often silent so we can get rid of it;pomme
=>pomm
)startOfWords
: replacements contained in this category will be performed only at the start of words (Useful for general rules at the start of words, eg: theh
is often silent so we can get rid of it;haricot
=>aricot
)wholeWords
: replacements contained in this category will be performed only if they exactly match a whole word (Useful for words that need a specific conversion that does not follow general rules, eg:monsieur
=>mr
)endOfWordsFollowedByASpace
: replacements contained in this category will be performed only at the end of words that are followed by a space (Useful to replace the space as well, eg:je
can often be contracted with what follows it;je suis
=>jsuis
)
The glossary supports three types of actions:
replace
: to replace some characters by some other charactersdisable_modification
: to prevent some characters from being replacedenable_modification
: to re-allow some characters to be replaced
By default, the whole text input is subject to replacements. Though, some characters can be protected from replacements for a given time.
For instance, we may want to replace every occurence of si
by 6
as it is a good sms equivalent (sinon
would become 6non
, aussi
would become au6
).
But some sounds like sin
often sound like zin
so replacing si
by 6
would be wrongly interpreted (usine
would become u6ne
).
So we may want to disable replacements on sin
while we replace all si
occurrences by 6
, then re-enable further replacements on sin
.
If for some reason you would like to enhance the glossary, feel free to do a pull request containing your modifications within the glossary as well as tests fixtures covering what you enhanced.