最近在帮助用户恢复数据库时遇到了一则罕见的归档日志损坏案例。
在进行归档recover时,数据库报错,提示归档日志损坏:
***
Corrupt block seq: 37288 blocknum=1.
Bad header found during deleting archived log
Data in bad block - seq:810559520. bno:170473264. time:707406346
beg:21280 cks:21061
calculated check value: 9226
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
Reread of seq=37288, blocknum=1, file=/ARCH/arch_1_37288_632509987.dbf, found same corrupt data
***
信息比较详细,说37288号归档日志Header损坏,无法读取数据。
提一个小问题:如果你遇到了这样的错误?会怎样思考?
如果这个归档日志损坏了,其实我们仍然有办法跳过去,继续尝试恢复其他日志,但是客户数据重要,不能容忍不一致性,这时候就只能放弃部分数据,由前台重新提交数据了。这在业务上可以实现,也就不是大问题了。
好了,问题是为什么日志会损坏?是如何损坏的?
我首先要做的就是,看看日志文件的内容,通过最简单的命令将日志文件中的内容输出出来:
strings arch_1_37288_632509987.dbf > log.txt
然后检查生成的这个日志文件,我们就发现了问题。
在这个归档日志文件中,被写入了大量的跟踪文件内容,其中开头部分就是一个跟踪文件的全部信息。
这时一种我从来没有遇到过的现象,也就是说,当操作系统在写出跟踪文件时,错误的覆盖掉了已经存在的归档文件,最后导致归档日志损坏,非常奇妙,从所未见。
最后我的判断是,这个故障应当是操作系统在写出时出现了问题,存在文件的空间仍然被认为是可写的,这样就导致了写冲突,出现这类问题,应当立即检查硬件,看看是否是硬件问题导致了如此严重的异常。
Dump file /ADMIN/bdump/erp_p007_19216.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /DBMS/erp/erpdb/10g
eygle.com
2.6.9-34.ELhugemem
#1 SMP Fri Feb 24 17:04:34 EST 2006
i686
Instance name: erp
Redo thread mounted by this instance: 1
Oracle process number: 22
Unix process pid: 19216, image: oracle@eygle.com (P007)
*** SERVICE NAME:() 2010-11-10 10:37:26.247
*** SESSION ID:(2184.1) 2010-11-10 10:37:26.247
*** 2010-11-10 10:37:26.247
KCRP: blocks claimed = 61, eliminated = 0
----- Recovery Hash Table Statistics ---------
Hash table buckets = 32768
Longest hash chain = 1
Average hash chain = 61/61 = 1.0
Max compares per lookup = 0
Avg compares per lookup = 0/61 = 0.0
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 32768
Longest hash chain = 1
Average hash chain = 61/61 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 1426/1426 = 1.0
----------------------------------------------
\\GPAYMENTdxn
AP_CHECKS
Q(xn
.1=N
\\Gxn
.1=N
^0e
^0e!
^0e"
^0e#
^0e$
^0e%
^0e&
^0e\'
eygle.com!/
^0e(
^0e)
^0e*
^0e+
^0e+
^0e&
^ij1
R0:b
Q(xn
PaymentsN
a\'VND
Userxn
AP_INVOICE_PAYMENTS
105273
5406105305-20101020-003
3001CASH CLEARING
CREATED
Dump file /ADMIN/bdump/erp_p002_19206.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /DBMS/erp/erpdb/10g
Linux
eygle.com
2.6.9-34.ELhugemem
#1 SMP Fri Feb 24 17:04:34 EST 2006
i686
Instance name: erp
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 19206, image: oracle@eygle.com (P002)
*** SERVICE NAME:() 2010-11-10 10:37:26.263
*** SESSION ID:(2187.1) 2010-11-10 10:37:26.263
*** 2010-11-10 10:37:26.263
原文转自:http://blogread.cn/it/article/3277