在一些编程语言(比如 Perl 和 Java)中,有一些公共域(domain)模块可以用来对文本完成语言转换。
下面给出一个稍微简单一点儿的例子,假设我们要将一个数字转换成其拼写版本(例如需要填写支票和法律合同)。这个诀窍在 Oracle 出现的早期已经有了,一般都以如下方式使用:
selectto_char(to_date(12345,'J'),'Jsp') from dual;
Twelve Thousand Three Hundred Forty-Five
TO_DATE 函数使用 Julian 日期格式将数字转换成一个日期。然后,TO_CHAR 接受一个日期参数并再次将其格式化为一个表示 Julian 日期的拼写数字版本的字符串。但是这个决窍有一些限制。
首先,在 Oracle 中 Julian 日期的最大有效值是9999年,所以日期的最大值只能取到5373484,而最小值是1或4712BC。而且,因为没有第“零”年,所以如果不额外使用一个 DECODE 或 CASE 语句就不可能生成文本“零”。第三个大的限制是它会忽略掉你的 NLS 设置。不管你使用的是哪种语言,数字总是以美国英语拼写出。一些简单的操作也存在这样的问题,比如拼写出天。例如,尝试生成西班牙语短语“Cinco de Mayo”:
alter session set nls_language = 'SPANISH';
select to_char(to_date('0505','MMDD'),'Ddspth Month') from dual;
Fifth Mayo
create table numwords
lang varchar2(2),
num integer,
word varchar2(30),
constraint numwords_pk primary key (lang,num)
create table numrules
lang varchar2(2),
seq integer,
p1 integer,
p2 integer,
temp0 varchar2(30),
temp varchar2(30),
constraint numrules_pk primary key (lang,seq)
REM -- create a table of base words and exceptions
create or replace package genword
function get_word(n number) return varchar2;
function cardinal(n number) return varchar2;
end genword;
create or replace package body genword
function get_word(n number) return varchar2
select word into l_word from numwords
where lang = sys_context('userenv','lang') and num = n;
return l_word;
when no_data_found then
return null;
function cardinal(n number) return varchar2
p number; -- power
t varchar2(30); -- template
v number; -- lower portion
l_word numwords.word%type;
if n < 0 then
l_word := get_word(-1);
if l_word is null then
return null;
end if;
return l_word||' '||cardinal(-n);
end if;
l_word := get_word(n);
if l_word is not null then
return l_word;
end if;
for row in
select * from numrules
where lang = sys_context('userenv','lang')
order by seq
if length(n) <= row.p1 + row.p2 then
p := power(10,row.p2);
v := mod(n,p);
if row.seq = 0 then
if n < 20 then
return replace(row.temp0,'~2',cardinal(v));
end if;
if v = 0 then
return replace(row.temp0,'~1',cardinal(n/p));
return replace(replace(nvl(row.temp,'~1 ~2'),
end if;
end if;
end if;
end loop;
end cardinal;
end genword;
最后,这里是我为英语和德语收集的一些数据。我还将数据从美国英语拷贝到英国英语中并使用术语“thousand million”和“million million”代替“billion”和“trillion”(美国用法),在美国之外这两个短语通常是混淆的来源。这些数据对生成-999,999,999,999到999,999,999,999之间所有整数(包括零)的拼写版本已经足够了。
REM -- American English
insert into numwords values ('US',-1,'negative');
insert into numwords values ('US',0,'zero');
insert into numwords values ('US',1,'one');
insert into numwords values ('US',2,'two');
insert into numwords values ('US',3,'three');
insert into numwords values ('US',4,'four');
insert into numwords values ('US',5,'five');
insert into numwords values ('US',6,'six');
insert into numwords values ('US',7,'seven');
insert into numwords values ('US',8,'eight');
insert into numwords values ('US',9,'nine');
insert into numwords values ('US',10,'ten');
insert into numwords values ('US',11,'eleven');
insert into numwords values ('US',12,'twelve');
insert into numwords values ('US',13,'thirteen');
insert into numwords values ('US',15,'fifteen');
insert into numwords values ('US',18,'eighteen');
insert into numwords values ('US',20,'twenty');
insert into numwords values ('US',30,'thirty');
insert into numwords values ('US',40,'forty');
insert into numwords values ('US',50,'fifty');
insert into numwords values ('US',80,'eighty');
insert into numwords select 'GB',num,word from numwords where lang = 'US';
insert into numrules values ('US',0,1,1,'~2teen',null);
insert into numrules values ('US',1,1,1,'~1ty','~1-~2');
insert into numrules values ('US',2,1,2,'~1 hundred',null);
insert into numrules values ('US',3,3,3,'~1 thousand',null);
insert into numrules values ('US',4,3,6,'~1 million',null);
insert into numrules select 'GB',seq,p1,p2,temp0,temp
from numrules where lang = 'US';
insert into numrules values ('US',5,3,9,'~1 billion',null);
insert into numrules values ('GB',5,3,9,'~1 thousand million',null);
insert into numrules values ('US',6,3,12,'~1 trillion',null);
insert into numrules values ('GB',6,3,12,'~1 million million',null);
REM - German
insert into numwords values ('D',-1,'negativ');
insert into numwords values ('D',0,'null');
insert into numwords values ('D',1,'eins');
insert into numwords values ('D',2,'zwei');
insert into numwords values ('D',3,'drei');
insert into numwords values ('D',4,'vier');
insert into numwords values ('D',5,unistr('f 0FCnf'));
insert into numwords values ('D',6,'sechs');
insert into numwords values ('D',7,'sieben');
insert into numwords values ('D',8,'acht');
insert into numwords values ('D',9,'neun');