about UTF- 8_Unix系统_领测软件测试网

about UTF- 8

发表于：2007-07-04来源：作者：点击数：标签：

UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. AnnexBUTF- 8 UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. It has the clearadv ant

UTF-8compaction mode is principally designed to support data systems with8-bit communications paths.

AnnexBUTF- 8

UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. It has the clearadvantage that the character addresses U+0000_hex toU+007F_hex, corresponding ASCII (and ISO 646:1991) values00_hex to 7F_hex are represented by single octetsof the same value. It is straightforward both to generate and parseand produces reasonable compaction.

Inputand output of up to 21-bit Unicode 3 character addresses for all 1114 112 characters on the 17 Code Planes 0 through 16 can becumbersome in normal byte-oriented data systems. In Table B.1, thelength of the binary data representation of characters to be encoded(ignoring leading zero bits) determines how many UTF-8 bytes arerequired.

TableB.1: UTF- 8 byte sequences for Unicode character addresses

Datatype and length

Unicodeaddress

(binaryformat)

1^stByte

2^ndByte

3^rdByte

4^thByte

Upto 7-bits, encoded as 7-bit ASCII or ISO 646

000000000xxxxxxx

0xxxxxxxx

8to 11 bits

00000yyyyyxxxxxx

110yyyyy

10xxxxxx

16bits (BMP)

zzzzyyyyyyxxxxxx

1110zzzz

10yyyyyy

10xxxxxx

21bits, Code Planes 1-16

000uuuuuzzzzyyyy yyxxxxxx

11110uuu

10uuzzzz

10yyyyyy

10xxxxxx

Duringdecoding, the number of bytes in each UTF-8 byte sequence can beimmediately determined from the first byte of each sequence.

LegalUTF-8 byte sequences shall conform to Unicode Technical Report 27as summarized in Table B.2.

TableB.2 – Unicode address ranges for legal UTF-8 byte sequences

Unicodeaddress range	1^stByte	2^ndByte	3^rdByte	4^thByte
U+0000to U+007F	00…7F
U+0080to U+07FF	C2...DF	80…BF
U+0800to U+0FFF	E0	A0...BF	80...BF
U+1000to U+FFFF	E1…EF	80...BF	80...BF
U+10000to U+3FFFF	F0	90…BF	80…BF	80…BF
U+40000to U+FFFFF	F1…F3	80…BF	80…BF	80…BF
U+100000to U+10FFFF	F4	80…BF	80…BF	80…BF

原文转自：http://www.ltesting.net

软件测试 > 测试开发技术 > 软件测试环境搭建 > Unix系统 >

about UTF- 8

AnnexBUTF- 8

TableB.1: UTF- 8 byte sequences for Unicode character addresses

TableB.2 – Unicode address ranges for legal UTF-8 byte sequences

相关文章

漫画赏析：Linux 内核到底长啥样

Linux的进程优先级

Windows原生运行Linux的技术细节

Linux常用性能调优工具索引

top使用技巧

bash遍历目录

先测试再开发？TDD测试驱动

全网最详细的接口测试实战

自动化测试架构

软件测试架构师的知识能力

大数据平台测试方法

用不同的测试模型来构建测

当软件测试遇上ChatGPT：软件

先测试再开发？TDD测试驱动

全网最详细的接口测试实战

自动化测试架构

软件测试架构师的知识能力

大数据平台测试方法

用不同的测试模型来构建测

当软件测试遇上ChatGPT：软件

MBT基于模型的测试介绍资料

iso29119相关介绍性资料

HP QTP 10 中文版官方中文补丁

HP QTP 10 英文版下载地址

HP ALM 11 官方中文版下载地址

Quality Center 9.0中文版下载地

HttpWatch Basic Edition Version 7.

WIN2003+ORACLE11G+QC11(ALM11) 安装

WIN2003+SQL2005(SP3)+QC11(ALM11) 安

软件测试沙龙 More>>

新浪微博 More>>

热门标签

《测试团队的招聘与管理

《我们应该如何构建我们

软件测试 > 测试开发技术 > 软件测试环境搭建 > Unix系统 >

about UTF- 8

AnnexBUTF- 8

TableB.1: UTF- 8 byte sequences for Unicode character addresses

TableB.2 – Unicode address ranges for legal UTF-8 byte sequences