Home  

Random  

Nearby  



Log in  



Settings  



Donate  



About Wikipedia  

Disclaimers  



Wikipedia





ISO-IR-165





Article  

Talk  



Language  

Watch  

Edit  





The CCITT Chinese Primary Set[2] is a multi-byte graphic character set for Chinese communications created for the Consultative Committee on International Telephone and Telegraph (CCITT) in 1992.[3] It is defined in ITU T.101, annex C, which codifies Data Syntax 2 Videotex.[2] It is registered with the ISO-IR registry for use with ISO/IEC 2022asISO-IR-165,[4] and encodable in the ISO-2022-CN-EXT code version.[1]

CCITT Chinese set (ISO-IR 165)
MIME / IANAiso-ir-165
Alias(es)CN-GB-ISOIR165 (EUC form)[1]
Language(s)Simplified Chinese, English, Russian
Partial support:
Greek, Japanese
StandardITU T.101, annex C
DefinitionsISO-IR 165
ExtendsGB 2312
Encoding formatsISO-2022-CN-EXT, Videotex Data Syntax 2
Succeeded byGB 18030
  • t
  • e
  • It is an extended modification of GB/T 2312-80, and corresponds to the union of the mainland Chinese GB standards GB 6345.1-86 and GB 8565.2-88, with some further modification and extensions. A subset of the GB 6345.1 extensions are incorporated into GB 18030, while GB 8565.2 serves as the mainland Chinese source reference for certain CJK Unified Ideographs.

    GB 6345.1

    edit

    GB 6345.1-86 (32 × 32 Dot Matrix Font Set of Chinese Ideographs for Information Interchange) includes both a corrigendum and an extension for GB 2312.[3] The corrigendum alters the following two characters:

    Alterations made to existing GB 2312 characters by GB 6345.1
    Row-cell EUC GB 2312 (Unamended)[5] GB 6345.1 Notes
    03-71 0xA3E7   ɡ [a]
    79-81 0xEFF1 [b]
    1. ^ Corresponds to U+FF47 FULLWIDTH LATIN SMALL LETTER G in Unicode; however, the amended reference glyph can also correspond to U+0261 ɡ LATIN SMALL LETTER SCRIPT G. See below for how U+0261 is typically mapped to/from GB/T 6341.1, versus how it is mapped to/from ISO-IR-165. GB 18030 swaps this one back to the original[5] looped glyph.[6]
  • ^ The unamended reference glyph is a Traditional Chinese character corresponding to U+937E. The character in question is usually replaced with (U+949F, also the simplification of ) in Simplified Chinese except in names of persons; the amended glyph is an alternate simplified form corresponding to U+953A.
  • Deployed implementations incorporating GB 2312, such as Windows code page 936, generally follow these corrections in mapping 79-81 to U+953A.[7]

    The extension adds half-width ISO 646-CN characters in row 10 (in addition to the existing full-width characters in row 3) and extends the set of 26 non-ASCII pinyin characters in row 8 with six additional such characters. These GB 6345.1 extensions are also incorporated into GB/T 12345, the Traditional Chinese counterpart to GB 2312, in addition to 29 vertical presentation forms in row 6.[3][8]

    Later GB/T 6345.1-2010 published in 2011 officially adds half-width forms of the 32 pinyin characters (including the six new additions) in row 8 to row 11.[9] This addition is not featured in GB 18030.[6]

    The six additional pinyin characters from GB 6345.1 and the vertical presentation forms from GB 12345 — but not the half-width forms — are included in the classic Mac OS encoding for Simplified Chinese (a modification of EUC-CN),[10] and also as two-byte codes in GB 18030.[6] The additional pinyin characters are as follows:[10]

    Extensions made by GB 6345.1 to GB 2312 row 8
    Row-cell EUC Character[10][6] Notes
    08-27 0xA8BB U+0251 ɑ LATIN SMALL LETTER ALPHA
    08-28 0xA8BC U+1E3F ḿ LATIN SMALL LETTER M WITH ACUTE [a]
    08-29 0xA8BD U+0144 ń LATIN SMALL LETTER N WITH ACUTE
    08-30 0xA8BE U+0148 ň LATIN SMALL LETTER N WITH CARON
    08-31 0xA8BF U+01F9 ǹ LATIN SMALL LETTER N WITH GRAVE [b]
    08-32 0xA8C0 U+0261   LATIN SMALL LETTER SCRIPT G [c]
    1. ^ Mapped to the Private Use Area U+E7C7byWindows code page 936[11] and the first (2000) edition of GB 18030; this was amended by the 2005 edition.[6]
  • ^ This composed character was added in Unicode 3.0. Prior to this, this character was mapped to its composition sequence (i.e. U+006E U+0300) by Apple.[10] This change predates the stabilisation of Unicode normalisation forms, which was introduced in Unicode 3.1.[12] It is mapped to U+E7C8byWindows code page 936.[11]
  • ^ Matches the unamended reference glyph for 03-71 (see above) in being a looped g, in spite of being typically mapped to U+0261. Mappings used for ISO-IR-165 differ (see below). GB 18030 swaps 03-71 back to the looped g, and makes this one the open g.[6]
  • These extensions and modifications to GB 2312 were first introduced in GB 5007.1-85 in 1985.

    GB 8565.2

    edit

    GB 8565.2-88 (Information Processing - Coded Character Sets for Text Communication - Part 2: Graphic Characters) defines an extension for GB 2312, adding 705 characters between rows 13–15 and 90–94, of which 69 (all in row 15) are non-hanzi. It includes the GB 2312 corrections from GB 6345.1, but not its extensions.[3]

    The Unihan database references GB 8565.2 as the mainland Chinese source of several hanzi included in Unicode. Its Unihan source abbreviation is G8.[2]

    CCITT changes

    edit

    ISO-IR-165 incorporates the GB 2312 extensions from both GB 6345.1-86 and GB 8565.2-88.[3] Additionally, it adds 161 further characters (including 139 hanzi, identified as “general Chinese characters and variants”).[3][4] These CCITT hanzi extensions have on occasion been mistaken for standard GB 8565.2 characters, including in previous revisions of the Unihan database.[2] In total the set contains 8446 characters.

    A number of patterned semigraphic characters are included in row 6.[4] This collides with the vertical presentation forms included in other extensions such as Mac OS Simplified Chinese[10] and GB 18030.[6]

    The GB 6345.1 corrections to GB 2312 are applied, but two Unicode mappings are reversed compared to other encodings which include GB 2312 with GB 6345.1 extensions. The table below shows the mappings and their corresponding glyphs including GB 18030:

    Row-cell EUC GB 2312 (unamended)[5] GB 6345.1[9] GB 6345.1 mapping[10] ISO-IR-165[4] ISO-IR-165 mapping[13] GB 18030[6] GB 18030 mapping[6]
    03-71 0xA3E7   ɡ U+FF47 ɡ U+0261   U+FF47
    08-32 0xA8C0 (absent)   U+0261   U+FF47 ɡ U+0261
    79-81 0xEFF1 U+953A U+953A U+953A

    References

    edit
    1. ^ a b Zhu, HF.; Hu, DY.; Wang, ZG.; Kao, TC.; Chang, WCH.; Crispin, M. (1996). "Chinese Character Encoding for Internet Messages". Requests for Comments. IETF. doi:10.17487/rfc1922. RFC 1922.
  • ^ a b c d Chung, Jaemin (2018-01-24). "Pseudo-G8 characters" (PDF). ISO/IEC JTC 1/SC 2/WG 2/IRG N2276.
  • ^ a b c d e f Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 94–111. ISBN 978-0-596-51447-1.
  • ^ a b c d CCITT (1992-07-13). Codes of the Chinese graphic character set for communication (PDF). ITSCJ/IPSJ. ISO-IR-165.
  • ^ a b c China Association for Standardization. Coded Chinese Graphic Character Set for Information Interchange (PDF). ITSCJ/IPSJ. ISO-IR-58.
  • ^ a b c d e f g h i Standardization Administration of China (SAC) (2005-11-18). GB 18030-2005: Information Technology—Chinese coded character set.
  • ^ Steele, Shawn (2000). "cp936 to Unicode table". Microsoft, Unicode Consortium.
  • ^ Lunde, Ken (1998). Appendix F: GB/T 12345 (PDF). O'Reilly Media. ISBN 9781565922242. {{cite book}}: |work= ignored (help)
  • ^ a b Standardization Administration of China (SAC) (2011-01-10). GB/T 6345.1-2010 信息技术 汉字编码字符集(基本集) 32点阵字型 第1部分宋体 (in Chinese (China)). China.{{cite book}}: CS1 maint: location missing publisher (link)
  • ^ a b c d e f "Map (external version) from Mac OS Chinese Simplified encoding to Unicode 3.0 and later". Apple, Inc.
  • ^ a b Microsoft. "CODEPAGE 936: PRC GBK (XGB) - ANSI, OEM". Unicode Consortium.
  • ^ "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.
  • ^ Viswanadha, Raghuram (2000-08-30). "Unicode to ISO-IR-165 table". International Components for Unicode. IBM. (Note: codes are listed in the source in 7-bit form: add 0x80 to each byte for EUC form, or subtract 0x20 for kuten form)
  • edit

    Retrieved from "https://en.wikipedia.org/w/index.php?title=ISO-IR-165&oldid=1211169203"
     



    Last edited on 1 March 2024, at 05:31  





    Languages

     


    Deutsch

     

    Wikipedia


    This page was last edited on 1 March 2024, at 05:31 (UTC).

    Content is available under CC BY-SA 4.0 unless otherwise noted.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Terms of Use

    Desktop