Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode 15.0 #237

Merged
merged 22 commits into from Oct 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
797f1c2
Handle translations for emoji that end with `\uFE0F` ("variant emoji_…
cvzi Sep 22, 2022
1a5751b
Add missing translations for emoji with/out `\uFE0F`
cvzi Sep 22, 2022
c7a48b8
`\uFE0F` can also appear inside a sequence not just as a suffix
cvzi Sep 23, 2022
f7cca5a
Run get_codes_from_unicode_emoji_data_files.py to update `EMOJI_DATA`
cvzi Sep 23, 2022
54610e0
Add Farsi/Persian to Readme
cvzi Sep 28, 2022
2fa37c4
Add Farsi/Persian to Github Pages
cvzi Sep 28, 2022
70dd641
Do not skip flag aliases
cvzi Sep 28, 2022
6f45e9c
Run `get_codes_from_unicode_emoji_data_files.py` to update `EMOJI_DATA`
cvzi Sep 28, 2022
370eee5
Include all aliases from Github API
cvzi Sep 28, 2022
70ece3f
Run get_codes_from_unicode_emoji_data_files.py to update `EMOJI_DATA`
cvzi Sep 28, 2022
83df586
Update to Unicode 15.0.0
cvzi Sep 28, 2022
3de5f99
Update `EMOJI_DATA` to Unicode 15.0.0
cvzi Sep 28, 2022
a19fdd6
Remove duplicate aliases
cvzi Sep 28, 2022
ca30c16
Update requirements
cvzi Sep 28, 2022
da2f769
Fix "sphinx warnings reference target not found" #216
cvzi Sep 28, 2022
c5ac514
Scraping from emojiterra https://github.com/carpedm20/emoji/issues/23…
AliNajafi1998 Sep 28, 2022
cdd6c51
New tests
cvzi Sep 30, 2022
ed5b967
Add missing translations from emojiterra
cvzi Oct 1, 2022
fff013b
Update `EMOJI_DATA` with translations from emojiterra
cvzi Oct 1, 2022
5c19561
Normalize emoji name to NFKC to find it in EMOJI_DATA
cvzi Oct 5, 2022
5e57840
Change docs of emojize()
cvzi Oct 5, 2022
1d0acd3
Add README.md - how to add a new language
cvzi Oct 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.rst
Expand Up @@ -32,7 +32,7 @@ both the full list and aliases.

By default, the language is English (``language='en'``) but also supported languages are:

Spanish (``'es'``), Portuguese (``'pt'``), Italian (``'it'``), French (``'fr'``), German (``'de'``)
Spanish (``'es'``), Portuguese (``'pt'``), Italian (``'it'``), French (``'fr'``), German (``'de'``), Farsi/Persian (``'fa'``)


.. code-block:: python
Expand Down
7 changes: 7 additions & 0 deletions docs/README.md
Expand Up @@ -23,6 +23,13 @@ pip install -r requirements.txt
make html
```

Check for warnings:

```bash
make clean
sphinx-build -n -T -b html . _build
```

Test code in code blocks:

```bash
Expand Down
6 changes: 2 additions & 4 deletions docs/api.rst
Expand Up @@ -7,8 +7,8 @@ API Reference
:noindex:


+--------------------------------------------------------------------------------------------+
| Table of Contents |
+-----------------------------+--------------------------------------------------------------+
| Table of Contents | |
+=============================+==============================================================+
| **Functions:** | |
+-----------------------------+--------------------------------------------------------------+
Expand All @@ -28,8 +28,6 @@ API Reference
+-----------------------------+--------------------------------------------------------------+
| :func:`version` | Find Unicode/Emoji version of an emoji |
+-----------------------------+--------------------------------------------------------------+
| :func:`get_emoji_regexp` | Returns compiled regular expression that matches all emojis |
+-----------------------------+--------------------------------------------------------------+
| **Module variables:** | |
+-----------------------------+--------------------------------------------------------------+
| :data:`EMOJI_DATA` | Dict of all emoji |
Expand Down
8 changes: 3 additions & 5 deletions docs/index.rst
Expand Up @@ -54,7 +54,7 @@ Languages

By default, the language is English (``language='en'``) but also supported languages are:

Spanish (``'es'``), Portuguese (``'pt'``), Italian (``'it'``), French (``'fr'``), German (``'de'``)
Spanish (``'es'``), Portuguese (``'pt'``), Italian (``'it'``), French (``'fr'``), German (``'de'``), Farsi/Persian (``'fa'``)

.. doctest::

Expand Down Expand Up @@ -313,8 +313,8 @@ Reference documentation of all functions and properties in the module:

api

+--------------------------------------------------------------------------------------------+
| API Reference |
+-----------------------------+--------------------------------------------------------------+
| API Reference | |
+=============================+==============================================================+
| **Functions:** | |
+-----------------------------+--------------------------------------------------------------+
Expand All @@ -334,8 +334,6 @@ Reference documentation of all functions and properties in the module:
+-----------------------------+--------------------------------------------------------------+
| :func:`version` | Find Unicode/Emoji version of an emoji |
+-----------------------------+--------------------------------------------------------------+
| :func:`get_emoji_regexp` | Returns compiled regular expression that matches all emojis |
+-----------------------------+--------------------------------------------------------------+
| **Module variables:** | |
+-----------------------------+--------------------------------------------------------------+
| :data:`EMOJI_DATA` | Dict of all emoji |
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
@@ -1,2 +1,2 @@
sphinx>=4.4.0
sphinx>=5.2.2
alabaster>=0.7.12
21 changes: 16 additions & 5 deletions emoji/core.py
Expand Up @@ -9,6 +9,8 @@

"""

import sys
import unicodedata
import re

from emoji import unicode_codes
Expand All @@ -21,6 +23,14 @@

_SEARCH_TREE = None
_DEFAULT_DELIMITER = ':'
_EMOJI_NAME_PATTERN = u'\\w\\-&.’”“()!#*+?–,/«»\u0300\u0301\u0302\u0303\u0308\u030a\u0327\u064b\u064e\u064f\u0650\u0653\u0654'
_PY2 = sys.version_info[0] == 2


def _normalize(form, s):
if _PY2:
s = unicode(s)
return unicodedata.normalize(form, s)


def emojize(
Expand All @@ -47,7 +57,8 @@ def emojize(

:param string: String contains emoji names.
:param delimiters: (optional) Use delimiters other than _DEFAULT_DELIMITER. Each delimiter
should contain at least one character that is not part of a-zA-Z0-9 and ``_-–&.’”“()!?#*+,/\``
should contain at least one character that is not part of a-zA-Z0-9 and ``_-&.()!?#*+,``.
See ``emoji.core._EMOJI_NAME_PATTERN`` for the regular expression of unsafe characters.
:param variant: (optional) Choose variation selector between "base"(None), VS-15 ("text_type") and VS-16 ("emoji_type")
:param language: Choose language of emoji name: language code 'es', 'de', etc. or 'alias'
to use English aliases
Expand Down Expand Up @@ -78,12 +89,12 @@ def emojize(
else:
language_pack = unicode_codes.get_emoji_unicode_dict(language)

pattern = re.compile(u'(%s[\\w\\-&.’”“()!#*+?–,/ًٌٍَُِّْؤئيإأآةك‌ٔء«»]+%s)' %
(re.escape(delimiters[0]), re.escape(delimiters[1])), flags=re.UNICODE)
pattern = re.compile(u'(%s[%s]+%s)' %
(re.escape(delimiters[0]), _EMOJI_NAME_PATTERN, re.escape(delimiters[1])), flags=re.UNICODE)

def replace(match):
mg = match.group(1)[len(delimiters[0]):-len(delimiters[1])]
emj = language_pack.get(_DEFAULT_DELIMITER + mg + _DEFAULT_DELIMITER)
name = match.group(1)[len(delimiters[0]):-len(delimiters[1])]
emj = language_pack.get(_DEFAULT_DELIMITER + _normalize('NFKC', name) + _DEFAULT_DELIMITER)
if emj is None:
return match.group(1)

Expand Down