summaryrefslogtreecommitdiff
path: root/sapi/phpdbg/phpdbg_print.c
diff options
context:
space:
mode:
authorAlex Dowad <alexinbeijing@gmail.com>2020-10-11 16:24:18 +0200
committerAlex Dowad <alexinbeijing@gmail.com>2021-01-15 08:26:46 +0200
commit5e5243ab650ceeff3febdeeee39373174acd8387 (patch)
tree4ae4023407c6e9346a5e5f153de8c065690fe2b7 /sapi/phpdbg/phpdbg_print.c
parent6e9c8386cb51be711435b203e68efe099c51b84a (diff)
downloadphp-git-5e5243ab650ceeff3febdeeee39373174acd8387.tar.gz
CP5022{0,1,2}: convert Unicode codepoints in 'user' area (0xE000-E757) correctly
Unicode has a range of 'private' codepoints which individual applications can use for their own purposes. When they were inventing CP932, MicroSoft mapped these 'private' or 'user' codepoints to ten new rows added to the JIS X 0208 character table. (JIS X 0208 is based on a 94x94 table; MS used rows 95-114 for private characters.) `mbfl_filt_conv_wchar_jis_ms` converted these private codepoints to rows 85-94 rather than 95-114. The code included a link to a document on the OpenGroup web site, dating back to 1996 [1], which proposed mapping private codepoints to these rows. However, that is not consistent with what mbstring does when converting CP5022x to Unicode. There seems to be a dearth of information on CP5022x on the web. However, I did find one (Japanese-language) page on CP50221, which states that it maps kuten codes 0x7F21-0x927E to the 'private' Unicode codepoints [2]. As a side note, using rows higher than 95 does seem to defeat one purpose of using an ISO-2022-JP variant: ISO-2022-JP was specifically designed to be "7-bit clean", but once you go beyond row 95, the ku codes are 0x80 and up, so 8 bits are needed. [1] https://web.archive.org/web/20000229180004/http://www.opengroup.or.jp/jvc/cde/ucs-conv.html [2] https://www.wdic.org/w/WDIC/Microsoft%20Windows%20Codepage%20%3A%2050221
Diffstat (limited to 'sapi/phpdbg/phpdbg_print.c')
0 files changed, 0 insertions, 0 deletions