• 软件测试技术
  • 软件测试博客
  • 软件测试视频
  • 开源软件测试技术
  • 软件测试论坛
  • 软件测试沙龙
  • 软件测试资料下载
  • 软件测试杂志
  • 软件测试人才招聘
    暂时没有公告

字号: | 推荐给好友 上一篇 | 下一篇

about UTF- 8

发布: 2007-7-04 12:06 | 作者: admin | 来源:  网友评论 | 查看: 11次 | 进入软件测试论坛讨论

领测软件测试网

UTF-8compaction mode is principally designed to support data systems with8-bit communications paths.

AnnexBUTF- 8


UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. It has the clearadvantage that the character addresses U+0000hex toU+007Fhex, corresponding ASCII (and ISO 646:1991) values00hex to 7Fhex are represented by single octetsof the same value. It is straightforward both to generate and parseand produces reasonable compaction.


Inputand output of up to 21-bit Unicode 3 character addresses for all 1114 112 characters on the 17 Code Planes 0 through 16 can becumbersome in normal byte-oriented data systems. In Table B.1, thelength of the binary data representation of characters to be encoded(ignoring leading zero bits) determines how many UTF-8 bytes arerequired.


TableB.1: UTF- 8 byte sequences for Unicode character addresses


Datatype and length


Unicodeaddress

(binaryformat)


1stByte


2ndByte


3rdByte


4thByte


Upto 7-bits, encoded as 7-bit ASCII or ISO 646


000000000xxxxxxx


0xxxxxxxx








8to 11 bits


00000yyyyyxxxxxx


110yyyyy


10xxxxxx






16bits (BMP)


zzzzyyyyyyxxxxxx


1110zzzz


10yyyyyy


10xxxxxx




21bits, Code Planes 1-16


000uuuuuzzzzyyyy yyxxxxxx


11110uuu


10uuzzzz


10yyyyyy


10xxxxxx


Duringdecoding, the number of bytes in each UTF-8 byte sequence can beimmediately determined from the first byte of each sequence.


LegalUTF-8 byte sequences shall conform to Unicode Technical Report 27as summarized in Table B.2.






TableB.2 – Unicode address ranges for legal UTF-8 byte sequences


Unicodeaddress range


1stByte


2ndByte


3rdByte


4thByte

U+0000to U+007F

00…7F




U+0080to U+07FF

C2...DF

80…BF



U+0800to U+0FFF

E0

A0...BF

80...BF


U+1000to U+FFFF

E1…EF

80...BF

80...BF


U+10000to U+3FFFF

F0

90…BF

80…BF

80…BF

U+40000to U+FFFFF

F1…F3

80…BF

80…BF

80…BF

U+100000to U+10FFFF

F4

80…BF

80…BF

80…BF


延伸阅读

文章来源于领测软件测试网 https://www.ltesting.net/


关于领测软件测试网 | 领测软件测试网合作伙伴 | 广告服务 | 投稿指南 | 联系我们 | 网站地图 | 友情链接
版权所有(C) 2003-2010 TestAge(领测软件测试网)|领测国际科技(北京)有限公司|软件测试工程师培训网 All Rights Reserved
北京市海淀区中关村南大街9号北京理工科技大厦1402室 京ICP备2023014753号-2
技术支持和业务联系:info@testage.com.cn 电话:010-51297073

软件测试 | 领测国际ISTQBISTQB官网TMMiTMMi认证国际软件测试工程师认证领测软件测试网