about UTF- 8_Unix系统_领测软件测试网

about UTF- 8

发表于：2007-05-26来源：作者：点击数：标签：

UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. AnnexBUTF- 8 UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. It has the clearadv ant

UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths.

Annex B UTF- 8

UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths. It has the clear advantage that the character addresses U+0000_hex to U+007F_hex, corresponding ASCII (and ISO 646:1991) values 00_hex to 7F_hex are represented by single octets of the same value. It is straightforward both to generate and parse and produces reasonable compaction.

Input and output of up to 21-bit Unicode 3 character addresses for all 1 114 112 characters on the 17 Code Planes 0 through 16 can be cumbersome in normal byte-oriented data systems. In Table B.1, the length of the binary data representation of characters to be encoded (ignoring leading zero bits) determines how many UTF-8 bytes are required.

Table B.1: UTF- 8 byte sequences for Unicode character addresses

Data type and length

Unicode address

(binary format)

1^st Byte

2^nd Byte

3^rd Byte

4^th Byte

Up to 7-bits, encoded as 7-bit ASCII or ISO 646

00000000 0xxxxxxx

0xxxxxxxx

8 to 11 bits

00000yyy yyxxxxxx

110yyyyy

10xxxxxx

16 bits (BMP)

zzzzyyyy yyxxxxxx

1110zzzz

10yyyyyy

10xxxxxx

21 bits, Code Planes 1-16

000uuuuu zzzzyyyy yyxxxxxx

11110uuu

10uuzzzz

10yyyyyy

10xxxxxx

During decoding, the number of bytes in each UTF-8 byte sequence can be immediately determined from the first byte of each sequence.

Legal UTF-8 byte sequences shall conform to Unicode Technical Report 27 as summarized in Table B.2.

Table B.2 – Unicode address ranges for legal UTF-8 byte sequences

Unicode address range	1^st Byte	2^nd Byte	3^rd Byte	4^th Byte
U+0000 to U+007F	00…7F
U+0080 to U+07FF	C2...DF	80…BF
U+0800 to U+0FFF	E0	A0...BF	80...BF
U+1000 to U+FFFF	E1…EF	80...BF	80...BF
U+10000 to U+3FFFF	F0	90…BF	80…BF	80…BF
U+40000 to U+FFFFF	F1…F3	80…BF	80…BF	80…BF
U+100000 to U+10FFFF	F4	80…BF	80…BF	80…BF

原文转自：http://www.ltesting.net

相关文章

漫画赏析：Linux 内核到底长啥样

Linux的进程优先级

Windows原生运行Linux的技术细节

Linux常用性能调优工具索引

top使用技巧

bash遍历目录

周排行

月排行

下载

全网最详细的接口测试实战

先测试再开发？TDD测试驱动

自动化测试架构

软件测试架构师的知识能力

大数据平台测试方法

用不同的测试模型来构建测

当软件测试遇上ChatGPT：软件

全网最详细的接口测试实战

先测试再开发？TDD测试驱动

自动化测试架构

软件测试架构师的知识能力

大数据平台测试方法

用不同的测试模型来构建测

当软件测试遇上ChatGPT：软件

MBT基于模型的测试介绍资料

iso29119相关介绍性资料

HP QTP 10 中文版官方中文补丁

HP QTP 10 英文版下载地址

HP ALM 11 官方中文版下载地址

Quality Center 9.0中文版下载地

HttpWatch Basic Edition Version 7.

WIN2003+ORACLE11G+QC11(ALM11) 安装

WIN2003+SQL2005(SP3)+QC11(ALM11) 安

软件测试沙龙 More>>

新浪微博 More>>

热门标签

功能测试

性能测试

安全测试

本地化测试

游戏测试

web测试

单元测试

敏捷测试

测试用例

测试模版

测试管理

测试工具

《测试团队的招聘与管理

《我们应该如何构建我们

软件测试 > 测试开发技术 > 软件测试环境搭建 > Unix系统 >

about UTF- 8

Annex B UTF- 8

Table B.1: UTF- 8 byte sequences for Unicode character addresses

Table B.2 – Unicode address ranges for legal UTF-8 byte sequences