Distcp工具深入分析(4)_性能测试

Distcp工具深入分析(4)

发表于：2014-06-25来源：淘测试作者：凡提点击数：标签：软件测试

{ // open src file in = srcstat.getPath().getFileSystem( job ).open(srcstat.getPath()); reporter.incrCounter(Counter. BYTESEXPECTED , srcstat.getLen()); // open tmp file out = create(tmpfile, reporter

{

// open src file

in = srcstat.getPath().getFileSystem(job).open(srcstat.getPath());

reporter.incrCounter(Counter.BYTESEXPECTED, srcstat.getLen());

// open tmp file

out = create(tmpfile, reporter, srcstat);

// copy file

for(int cbread; (cbread = in.read(buffer)) >= 0; ) {

out.write(buffer, 0, cbread);

cbcopied += cbread;

reporter.setStatus(

String.format("%.2f ", cbcopied*100.0/srcstat.getLen())

+ absdst + " [ " +

StringUtils.humanReadableInt(cbcopied) + " / " +

StringUtils.humanReadableInt(srcstat.getLen()) + " ]");

}

} finally {

checkAndClose(in);

checkAndClose(out);

}

　　Mapper执行完之后，DistCp工具的服务端执行过程就全部完成了，回到客户端还会做一些扫尾的工作，例如同步Owner权限。这里会有一些问题，稍后我们一并分析。

　　问题分析

　　DistCp存在三大问题，下面来一一剖析：

　　1. 任务失败，map task报“DFS Read: java.io.IOException: Could not obtain block”

　　这是由于“_distcp_src_files”这个文件的备份数是系统默认值，例如hadoop-site.xml里面设置了dfs.replication=3，那么_distcp_src_files文件的备份数则创建之后就为3了。当map数非常多，以至于超过了_distcp_src_files文件三个副本所在datanode最大容纳上限的时候，部分map task就会出现获取不了block的问题。对于DistCp来说“-i”参数一般是绝对不能使用的，因为设置了该参数，这个问题就会被掩盖，带来的后果就是拷贝完缺失了部分数据。比较好的做法是在计算了总map数之后，自动增加_distcp_src_files这个文件的备份数，这样一来访问容纳上限也会跟着提高，上述问题就不会再出现了。当前社区已对此有了简单fix，直接将备份数设置成了一个较高的数值。一般说来对于计算资源有限的集群来说，过多的maptask并不会提高拷贝的效率，因此我们可以通过-m参数来设定合理的map数量。一般说来通过观察ganglia，bytes_in、bytes_out达到上限就可以了。

原文转自：http://www.taobaotest.com/blogs/2516

软件测试 > 测试技术 > 性能测试 >