Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Guobiao  





2 Big5  





3 Conversion  





4 See also  





5 References  





6 Further reading  





7 External links  














Chinese character encoding






Deutsch
Español
Français

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


In computing, Chinese character encodings can be used to represent text written in the CJK languages—Chinese, Japanese, Korean—and (rarely) obsolete Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese.

In addition to Unicode (with the set of CJK Unified Ideographs), local encoding systems exist. The Chinese Guobiao (or GB, "national standard") system is used in Mainland China and Singapore, and the (mainly) Taiwanese Big5 system is used in Taiwan, Hong Kong and Macau as the two primary "legacy" local encoding systems. Guobiao is usually displayed using simplified characters and Big5 is usually displayed using traditional characters. There is however no mandated connection between the encoding system and the font used to display the characters; font and encoding are usually tied together for practical reasons.

The issue of which encoding to use can also have political implications, as GB is the official standard of the People's Republic of China and Big5 is a de facto standard of Taiwan.

In contrast to the situation with Japanese, there has been relatively little overt opposition to Unicode, which solves many of the issues involved with GB and Big5. Unicode is widely regarded as politically neutral, has good support for both simplified and traditional characters, and can be easily converted to and from the GB and Big5. Furthermore, Unicode has the advantage of not being limited only to Chinese, since it contains character codes for (nearly) every language.

Guobiao

[edit]

The Guobiao (GB) line of character encodings start with the Simplified Chinese charset GB 2312 published in 1980. Two encoding schemes existed for GB 2312: a one-or-two byte 8-bit EUC-CN encoding commonly used, and a 7-bit encoding called HZ[1] for usenet posts.[2]: 94  A traditional variant called GB/T 12345 was published in 1990.

The EUC-CN form was later extended into GBK to include all Unicode 1.1 CJK Ideographs in 1993, abandoning the ISO-2022 model. By doing so, GBK includes Traditional Chinese characters in addition to simplified ones in GB2312.[3] GBK gained popularity through the widespread Code page 936 implementation found in Microsoft Windows 95.

In 2000, GB 18030 was published as GBK's successor. This new encoding includes a four-byte UTF which encodes all Unicode codepoints not previously encoded.[4] In 2005, GB 18030 was published to contain reference glyphs for scripts used by ethnic minorities in China, as well as glyphs from CJK Unified Ideographs Extension B due to the update of Unicode.

Adobe-GB1 is the corresponding PostScript charset for GB encodings.

Big5

[edit]

The Big5 family of character encodings start with the initial definition by the consortium of five companies in Taiwan that developed it.[5] It is a double-byte character set (DBCS) somehow similar to Shift JIS, often combined with a MBCS like ASCII. Quite a few vendors as well as official extensions exist, of which ETEN, HKSCS (Hong Kong) and Big5-2003 (as a part of CNS 11643 by Taiwan) are the most well-known ones.[6] Adobe-CNS1 is the PostScript charset corresponding to the Big5 family of encodings.

Conversion

[edit]

Prior to GBK which includes both traditional and simplified characters, conversion between Traditional Chinese and Simplified Chinese charsets was complicated by the need of transcribing text between the two variants of Chinese, as one charset cover many of the other's characters only in its own variant. The conversion between traditional and simplified Chinese is usually problematic, because the simplification of some traditional forms merged two or more different characters into one simplified form. The traditional to simplified (many-to-one) conversion is technically simple. The opposite conversion often results in a data loss when converting to GB 2312: in mapping one-to-many when assigning traditional glyphs to the simplified glyphs, some characters will inevitably be the wrong choices in some of the usages. Thus simplified to traditional conversion often requires usage context or common phrase lists to resolve conflicts. This issue is less of a problem with newer standards such as GBK, GB 18030 and Unicode, which have separate code points for both simplified and traditional characters. [citation needed]

One other issue is that many of the encoding systems are missing characters. While the missing characters are often literary and not commonly used in ordinary text, this does become a problem because people's names often contain these characters. An example of the problem is the Taiwanese politician Wang Chien-shien who has a xuān () character in his name which is not in some character systems, and former Chinese premier Zhu Rongji, whose róng () character is not in GB 2312. The newest GB standard, GB 18030 has the complete character repertoire of Unicode 4.0, including the Unihan extensions in the Supplementary Ideographic Plane.[2]: 105 

See also

[edit]

References

[edit]
  1. ^ RFC 1843
  • ^ a b Lunde, Ken (December 2008). CJKV Information Processing. O'Reilly Media, Inc. ISBN 978-0-596-51447-1. Retrieved 11 September 2016.
  • ^ "GB18030-2000 - The New Chinese National Standard - GB 18030". 2012-08-25. Archived from the original on 2012-08-25. Retrieved 2016-10-13.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  • ^ Authoritative mapping table between GB18030-2000 and Unicode. ICU – International Components for Unicode. 2001-02-21. Accessed 2016-10-13.
  • ^ "[chinese mac] Character Sets". chinesemac.org. Retrieved 2016-10-13.
  • ^ "Big5 Variants in Mozilla: Mozilla 系列與 Big5 中文字碼". moztw.org. Retrieved 2016-10-13.
  • Further reading

    [edit]
    [edit]
    Retrieved from "https://en.wikipedia.org/w/index.php?title=Chinese_character_encoding&oldid=1210960770"

    Categories: 
    Korean language
    Chinese character encodings
    Encodings of Asian languages
    Hidden categories: 
    CS1 maint: bot: original URL status unknown
    Articles with short description
    Short description matches Wikidata
    Articles needing additional references from March 2016
    All articles needing additional references
    All articles with unsourced statements
    Articles with unsourced statements from April 2018
    Articles containing Chinese-language text
     



    This page was last edited on 29 February 2024, at 04:28 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki