Skip to content

Commit

Permalink
pythongh-101180: Fix a bug where iso2022_jp_3 and iso2022_jp_2004 cod…
Browse files Browse the repository at this point in the history
…ecs read out of bounds

iso2022_jp_3 and iso2022_jp_2004 codecs read out of bounds when encoding
Unicode combining character sequence.

This bug ocurs the following error:
$ python3 -c "print('\u304b\u309a'.encode('iso2022_jp_2004'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'iso2022_jp_2004' codec can't encode character '\u309a' in position 1: illegal multibyte sequence

This commit fixes the out-of-bounds read.
  • Loading branch information
moriyama committed Nov 3, 2023
1 parent ccc8caa commit 13286d1
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions Modules/cjkcodecs/_codecs_iso2022.c
Expand Up @@ -207,8 +207,9 @@ ENCODER(iso2022)

encoded = MAP_UNMAPPABLE;
for (dsg = CONFIG_DESIGNATIONS; dsg->mark; dsg++) {
Py_UCS4 buf[2] = {c, 0};
Py_ssize_t length = 1;
encoded = dsg->encoder(codec, &c, &length);
encoded = dsg->encoder(codec, buf, &length);
if (encoded == MAP_MULTIPLE_AVAIL) {
/* this implementation won't work for pair
* of non-bmp characters. */
Expand All @@ -217,9 +218,11 @@ ENCODER(iso2022)
return MBERR_TOOFEW;
length = -1;
}
else
else {
buf[1] = INCHAR2;
length = 2;
encoded = dsg->encoder(codec, &c, &length);
}
encoded = dsg->encoder(codec, buf, &length);
if (encoded != MAP_UNMAPPABLE) {
insize = length;
break;
Expand Down

0 comments on commit 13286d1

Please sign in to comment.