Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Design  





2 See also  





3 References  














UTF-1






Deutsch

Português
Русский

 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


UTF-1
MIME / IANAISO-10646-UTF-1
Language(s)International
Current statusObscure, of mainly historical interest.
ClassificationUnicode Transformation Format, extended ASCII, variable-width encoding
ExtendsUS-ASCII
Transforms / EncodesISO/IEC 10646 (Unicode)
Succeeded byUTF-8
  • t
  • e
  • UTF-1 is a method of transforming ISO/IEC 10646/Unicode into a stream of bytes. Its design does not provide self-synchronization, which makes searching for substrings and error recovery difficult. It reuses the ASCII printing characters for multi-byte encodings, making it unsuited for some uses (for instance Unix filenames cannot contain the byte value used for forward slash). UTF-1 is also slow to encode or decode due to its use of division and multiplication by a number which is not a power of 2. Due to these issues, it did not gain acceptance and was quickly replaced by UTF-8.

    Design

    [edit]

    Similar to UTF-8, UTF-1 is a variable-width encoding that is backwards-compatible with ASCII. Every Unicode code point is represented by either a single byte, or a sequence of two, three, or five bytes. All ASCII code points are a single byte (the code points U+0080 through U+009F are also single bytes).

    UTF-1 does not use the C0 and C1 control codes or the space character in multi-byte encodings: a byte in the range 0–0x20 or 0x7F–0x9F always stands for the corresponding code point. This design with 66 protected characters tried to be ISO/IEC 2022 compatible.

    UTF-1 uses "modulo 190" arithmetic (256 − 66 = 190). For comparison, UTF-8 protects all 128 ASCII characters and needs one bit for this, and a second bit to make it self-synchronizing, resulting in "modulo 64" arithmetic (8 − 2 = 6; 26 = 64). BOCU-1 protects only the minimal set required for MIME-compatibility (0x00, 0x07–0x0F, 0x1A–0x1B, and 0x20), resulting in "modulo 243" arithmetic (256 − 13 = 243).

    code point UTF-8 UTF-1
    U+007F 7F 7F
    U+0080 C2 80 80
    U+009F C2 9F 9F
    U+00A0 C2 A0 A0 A0
    U+00BF C2 BF A0 BF
    U+00C0 C3 80 A0 C0
    U+00FF C3 BF A0 FF
    U+0100 C4 80 A1 21
    U+015D C5 9D A1 7E
    U+015E C5 9E A1 A0
    U+01BD C6 BD A1 FF
    U+01BE C6 BE A2 21
    U+07FF DF BF AA 72
    U+0800 E0 A0 80 AA 73
    U+0FFF E0 BF BF B5 48
    U+1000 E1 80 80 B5 49
    U+4015 E4 80 95 F5 FF
    U+4016 E4 80 96 F6 21 21
    U+D7FF ED 9F BF F7 2F C3
    U+E000 EE 80 80 F7 3A 79
    U+F8FF EF A3 BF F7 5C 3C
    U+FDD0 EF B7 90 F7 62 BA
    U+FDEF EF B7 AF F7 62 D9
    U+FEFF EF BB BF F7 64 4C
    U+FFFD EF BF BD F7 65 AD
    U+FFFE EF BF BE F7 65 AE
    U+FFFF EF BF BF F7 65 AF
    U+10000 F0 90 80 80 F7 65 B0
    U+38E2D F0 B8 B8 AD FB FF FF
    U+38E2E F0 B8 B8 AE FC 21 21 21 21
    U+FFFFF F3 BF BF BF FC 21 37 B2 7A
    U+100000 F4 80 80 80 FC 21 37 B2 7B
    U+10FFFF F4 8F BF BF FC 21 39 6E 6C
    U+7FFFFFFF FD BF BF BF BF BF FD BD 2B B9 40

    Although modern Unicode ends at U+10FFFF, both UTF-1 and UTF-8 were designed to encode the complete 31 bits of the original Universal Character Set (UCS-4), and the last entry in this table shows this original final code point.

    See also

    [edit]

    References

    [edit]
    Retrieved from "https://en.wikipedia.org/w/index.php?title=UTF-1&oldid=1184594709"

    Category: 
    Unicode Transformation Formats
    Hidden categories: 
    Articles with short description
    Short description is different from Wikidata
    CS1 maint: numeric names: authors list
    CS1 errors: missing periodical
     



    This page was last edited on 11 November 2023, at 11:46 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki