`unicode_age`

A package for determining what version a Unicode codepoint was added to the standard

This package's version X.Y.Z tracks Unicode version X.Y, with Z reserved as a release counter for updates unrelated to the Unicode version.

Example usage

>>> import unicode_age
>>> codept = ord("\N{SNAKE}")  # added in Unicode 6.0
>>> print(unicode_age.version(codept))
(6, 0)

Rationale

Before writing this module, I was parsing DerivedAge.txt into a list[int | None], but this approach consumes an atrocious amount of memory (10 MB) for what it is. Using the representation here consumes three orders of magnitude less memory (~30 KB), and it was kinda fun to write besides :)

Updating

The script makeunicode_age.py consumes DerivedAge.txt and produces the header file that holds the backing data for this module and fills in the number of spans in the Cython template. To make a build for another version of the Unicode Character Database, you should be able to replace DerivedAge.txt and re-run this script.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
news		news
src		src
tests		tests
tools		tools
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DerivedAge.txt		DerivedAge.txt
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
makeunicode_age.py		makeunicode_age.py
pyproject.toml		pyproject.toml
setup.py		setup.py
unicode_age.pyx.in		unicode_age.pyx.in

License

SnoopJ/unicode_age

Folders and files

Latest commit

History

Repository files navigation

unicode_age

Example usage

Rationale

Updating

About

Resources

License

Stars

Watchers

Forks

Languages

`unicode_age`