Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Guobiao 2 Big5 3 Conversion 4 See also 5 References 6 Further reading 7 External links

Chinese character encoding

●Deutsch ●Español ●Français ●中文 Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Appearance From Wikipedia, the free encyclopedia

This article needs additional citations for verification. Please help improve this articlebyadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Chinese character encoding" – news · newspapers · books · scholar · JSTOR (March 2016) (Learn how and when to remove this message)

In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese.

In addition to Unicode (with the set of CJK Unified Ideographs), local encoding systems exist. The Chinese Guobiao (or GB, "national standard") system is used in Mainland China and Singapore, and the (mainly) Taiwanese Big5 system is used in Taiwan, Hong Kong and Macau as the two primary "legacy" local encoding systems. Guobiao is usually displayed using simplified characters and Big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters; font and encoding are usually tied together for practical reasons.

The issue of which encoding to use can also have political implications, as GB is the official standard of the People's Republic of China and Big5 is a de facto standard of Taiwan.

In contrast to the situation with Japanese, there has been relatively little overt opposition to Unicode, which solves many of the issues involved with GB and Big5. Unicode is widely regarded as politically neutral, has good support for both simplified and traditional characters, and can be easily converted to and from the GB and Big5. Furthermore, Unicode has the advantage of not being limited only to Chinese, since it contains character codes for (nearly) every language.

Guobiao

[edit]

The Guobiao (GB) line of character encodings start with the Simplified Chinese charset GB 2312 published in 1980. Two encoding schemes existed for GB 2312: a one-or-two byte 8-bit EUC-CN encoding commonly used, and a 7-bit encoding called HZ^[1] for usenet posts.^[2]^: 94 A traditional variant called GB/T 12345 was published in 1990.

The EUC-CN form was later extended into GBK to include all Unicode 1.1 CJK Ideographs in 1993, abandoning the ISO-2022 model. By doing so, GBK includes Traditional Chinese characters in addition to simplified ones in GB2312.^[3] GBK gained popularity through the widespread Code page 936 implementation found in Microsoft Windows 95.

In 2000, GB 18030 was published as GBK's successor. This new encoding includes a four-byte UTF which encodes all Unicode codepoints not previously encoded.^[4] In 2005, GB 18030 was published to contain reference glyphs for scripts used by ethnic minorities in China, as well as glyphs from CJK Unified Ideographs Extension B due to the update of Unicode.

Adobe-GB1 is the corresponding PostScript charset for GB encodings.

Big5

[edit]

The Big5 family of character encodings start with the initial definition by the consortium of five companies in Taiwan that developed it.^[5] It is a double-byte character set (DBCS) somehow similar to Shift JIS, often combined with a MBCS like ASCII. Quite a few vendors as well as official extensions exist, of which ETEN, HKSCS (Hong Kong) and Big5-2003 (as a part of CNS 11643 by Taiwan) are the most well-known ones.^[6] Adobe-CNS1 is the PostScript charset corresponding to the Big5 family of encodings.

Conversion

[edit]

Prior to GBK which includes both traditional and simplified characters, conversion between Traditional Chinese and Simplified Chinese charsets was complicated by the need of transcribing text between the two variants of Chinese, as one charset cover many of the other's characters only in its own variant. The conversion between traditional and simplified Chinese is usually problematic, because the simplification of some traditional forms merged two or more different characters into one simplified form. The traditional to simplified (many-to-one) conversion is technically simple. The opposite conversion often results in a data loss when converting to GB 2312: in mapping one-to-many when assigning traditional glyphs to the simplified glyphs, some characters will inevitably be the wrong choices in some of the usages. Thus simplified to traditional conversion often requires usage context or common phrase lists to resolve conflicts. This issue is less of a problem with newer standards such as GBK, GB 18030 and Unicode, which have separate code points for both simplified and traditional characters. ^{[citation needed]}

One other issue is that many of the encoding systems are missing characters. While the missing characters are often literary and not commonly used in ordinary text, this does become a problem because people's names often contain these characters. An example of the problem is the Taiwanese politician Wang Chien-shien who has a xuān (煊) character in his name which is not in some character systems, and former Chinese premier Zhu Rongji, whose róng (镕) character is not in GB 2312. The newest GB standard, GB 18030 has the complete character repertoire of Unicode 4.0, including the Unihan extensions in the Supplementary Ideographic Plane.^[2]^: 105

References

[edit]

^ RFC 1843

^ ^a ^b Lunde, Ken (December 2008). CJKV Information Processing. O'Reilly Media, Inc. ISBN 978-0-596-51447-1. Retrieved 11 September 2016.

^ "GB18030-2000 - The New Chinese National Standard - GB 18030". 2012-08-25. Archived from the original on 2012-08-25. Retrieved 2016-10-13.{{cite web}}: CS1 maint: bot: original URL status unknown (link)

^ Authoritative mapping table between GB18030-2000 and Unicode. ICU – International Components for Unicode. 2001-02-21. Accessed 2016-10-13.

^ "[chinese mac] Character Sets". chinesemac.org. Retrieved 2016-10-13.

^ "Big5 Variants in Mozilla: Mozilla 系列與 Big5 中文字碼". moztw.org. Retrieved 2016-10-13.

External links

[edit]

Chinese, Japanese and Korean computing

Encodings

Chinese	ISO-2022-CN CNS 11643 Big5 HKSCS GB 18030 GBK GB 2312 GB/T 12345 HZ ISO-IR-165 CCCII
Japanese	ISO-2022-JP JIS JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 Shift-JIS
Korean	ISO-2022-KR KS X 1001 KS X 1002 KPS 9566 GB 12052
International	EUC ISO/IEC 2022 Unicode CJK Unified Ideographs Han unification

Input methods

Fonts

List of CJK fonts

Retrieved from "https://en.wikipedia.org/w/index.php?title=Chinese_character_encoding&oldid=1210960770" Categories: ●Korean language ●Chinese character encodings ●Encodings of Asian languages Hidden categories: ●CS1 maint: bot: original URL status unknown ●Articles with short description ●Short description matches Wikidata ●Articles needing additional references from March 2016 ●All articles needing additional references ●All articles with unsourced statements ●Articles with unsourced statements from April 2018 ●Articles containing Chinese-language text ●This page was last edited on 29 February 2024, at 04:28 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view