GitHub - mloar/charset: Simon Tatham's charset library

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
LICENCE		LICENCE
Makefile		Makefile
NTMakefile		NTMakefile
README		README
big5enc.c		big5enc.c
big5set.c		big5set.c
charset.h		charset.h
cns11643.c		cns11643.c
confuse.c		confuse.c
cp949.c		cp949.c
cstable.c		cstable.c
emacsenc.c		emacsenc.c
enum.c		enum.c
euc.c		euc.c
fromucs.c		fromucs.c
gb2312.c		gb2312.c
hz.c		hz.c
internal.h		internal.h
iso2022.c		iso2022.c
iso2022s.c		iso2022s.c
istate.c		istate.c
jisx0208.c		jisx0208.c
jisx0212.c		jisx0212.c
ksx1001.c		ksx1001.c
locale.c		locale.c
localenc.c		localenc.c
macenc.c		macenc.c
mimeenc.c		mimeenc.c
sbcs.c		sbcs.c
sbcs.dat		sbcs.dat
sbcsgen.pl		sbcsgen.pl
shiftjis.c		shiftjis.c
slookup.c		slookup.c
superset.c		superset.c
test.c		test.c
toucs.c		toucs.c
utf16.c		utf16.c
utf7.c		utf7.c
utf8.c		utf8.c
xenc.c		xenc.c

Repository files navigation

This subdirectory contains a general character-set conversion
library, used in Timber, and available for use in other software if
it should happen to be useful.

I intend to use this same library in other programs at some future
date. (A cut-down version of it is already in use in some ports of
PuTTY.) It is therefore a _strong_ design goal that this library
should remain perfectly general, and not tied to particulars of
Timber. It must not reference any code outside its own subdirectory;
it should not have Timber-specific helper routines added to it
unless they can be documented in a general manner which might make
them useful in other circumstances as well.

There are some multibyte character encodings which this library does
not currently support. Those that I know of are:

 - Johab. There is no reason why we _shouldn't_ support this, but it
   wasn't immediately necessary at the time I did the initial
   coding. If anyone needs it, it shouldn't be too hard. The Unicode
   mapping table for the encoding is available at
   http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/JOHAB.TXT

 - ISO-2022-JP-1 (RFC 2237), and ISO-2022-JP-2 (RFC 1554). These
   should be even easier if required - we already have the ISO 2022
   machinery in place, and support all the underlying character
   sets.

 - ISO-2022-CN and ISO-2022-CN-EXT (RFC 1922). These are a little tricky
   as they allow use of both GB2312 (simplified Chinese) and CNS 11643
   (traditional Chinese), so we may need some way to specify which to
   prefer.

 - The Hong Kong (HKSCS) extension to Big5. Again, mapping tables
   are available in the Unihan database.

 - Other Big Five extensions, which I don't have mapping tables for
   at all.