It's easy to make mistakes when testing software or planning a
testing effort. Some mistakes are made so often, so repeatedly, by so
many different people, that they deserve the label Classic Mistake.
在测试软件或制订测试工作计划时很容易犯一些错误。有些错误经常被许多不同的人一而再、再而三地犯,应该被列为典型错误。
Classic mistakes cluster usefully into five groups, which I've called "themes":
典型错误可以有效地分为五组,我把这些组称为“主题”。
· The Role of Testing: who does the testing team serve, and how does it do that?
· 测试的作用:谁承担测试小组的责任,如何做?
· Planning the Testing Effort: how should the whole team's work be organized?
· 制订测试工作计划:应该如何组织整个小组的工作?
· Personnel Issues: who should test?
· 人员问题:谁应该做测试?
· The Tester at Work: designing, writing, and maintaining individual tests.
· 工作中的测试员:设计、编写和维护各测试。
· Technology Rampant: quick technological fixes for hard problems.
· 过度使用技术:艰难问题的快速技术修复
I have two goals for this paper. First, it should identify the mistakes, put them in context, describe why they're mistakes, and suggest alternatives. Because the context of one mistake is usually prior mistakes, the paper is written in a narrative style rather than as a list that can be read in any order. Second, the paper should be a handy checklist of mistakes. For that reason, the classic mistakes are printed in a larger bold font when they appear in the text, and they're also summarized at the end.
本文有两个目标。第一,应当识别错误,将它们放到具体环境中,描述它们为什么是错误,并给出替代方法的建议。因为一个错误的具体环境通常是先决错 误,所以本文将以叙事的方式而不是以可以按任意顺序阅读的列表方式来描述。第二,本文应该是一个便于查看的错误列表。因为这个原因,文章中出现的典型错误 都以大号粗体字印刷,并在文章的结尾处汇总。
Although many of these mistakes apply to all types of software projects, my specific focus is the testing of commercial software products, not custom software or software that is safety critical or mission critical.
虽然这些错误很多都适用于所有类型的软件项目,但我的重点将放在商用软件产品的测试上,而不是定制软件或者是高度安全或关键任务的软件测试上。
This paper is essentially a series of bug reports for the testing process. You may think some of them are features, not bugs. You may disagree with the severities I assign. You may want more information to help in debugging, or want to volunteer information of your own. Any decent bug reporting system will treat the original bug report as the first part of a conversation. So should it be with this paper. Therefore, follow this link for an ongoing discussion of this topic.
本文主要是测试过程的一系列错误报告。你可能认为它们中的部分属于特性问题而不是 bug。你可能不赞成我设定的严重性级别。你可能需要更多的信息以用于帮助排除错误,或者希望提供你自己的信息。任何设计良好的错误报告系统都将原始的错 误报告当作是对话的起始部分。本文也是这样,所以,可以按照链接参加这个主题的讨论。
Theme One: The Role of Testing
主题一:测试的作用
A first major mistake people make is thinking that the testing team is responsible for assuring quality. This role, often assigned to the first testing team in an organization, makes it the last defense, the barrier between the development team (accused of producing bad quality) and the customer (who must be protected from them). It's characterized by a testing team (often called the "Quality Assurance Group") that has formal authority to prevent shipment of the product. That in itself is a disheartening task: the testing team can't improve quality, only enforce a minimal level. Worse, that authority is usually more apparent than real. Discovering that, together with the perverse incentives of telling developers that quality is someone else's job, leads to testing teams and testers who are disillusioned, cynical, and view themselves as victims. We've learned from Deming and others that products are better and cheaper to produce when everyone, at every stage in development, is responsible for the quality of their work ([Deming86], [Ishikawa85]).
人们犯的第一个主要错误是认为测试小组应当负责质量保证。这个角色常常分配给组织中的第一测试小组,将它作为最后的防御,成为开发小组(被指责为产 生低劣质量)和客户(必须受到保护以远离低劣质量)的一个屏障。它的特征是测试小组(常称为“质量保证组”)表面上具有阻止产品发货的权力。这本身是一个 令人沮丧的任务:测试小组不能提高质量,只能强制一个最低水平。更糟糕的是,这种权力常常是看上去比实际的重要。如果发现这一点,再加上有违常理地暗示开 发人员质量是别人的事情,导致测试小组和测试员感到失望、愤事嫉俗、感觉自己是受害者。我们从Deming 和其他人的工作可以得知:如果每个人都在开发的各个阶段对他们的工作质量负责,则产品会又好又便宜([Deming86],[Ishikawa85])。
In practice, whatever the formal role, most organizations believe that the purpose of testing is to find bugs. This is a less pernicious definition than the previous one, but it's missing a key word. When I talk to programmers and development managers about testers, one key sentence keeps coming up: "Testers aren't finding the important bugs." Sometimes that's just griping, sometimes it's because the programmers have a skewed sense of what's important, but I regret to say that all too often it's valid criticism. Too many bug reports from testers are minor or irrelevant, and too many important bugs are missed.
实际上,不管表面上的作用是什么,大多数组织都相信测试的目的是发现 bug。这个定义的危害比前一个定义的危害要小,但是忽略了一个关键词。当我同程序员和开发经理谈到测试员的时候,不时听到一个关键的句子:测试员找不到 重要的 bug。有时候这种说法只是一种抱怨,有时候是因为程序员对于什么是正确的感觉不对,但我很遗憾地说,它们经常是有效的批评。测试员的太多的bug 报告是微小的、不相关的,而有太多重要的错误都被遗漏了。
What's an important bug? Important to whom? To a first approximation, the answer must be "to customers". Almost everyone will nod their head upon hearing this definition, but do they mean it? Here's a test of your organization's maturity. Suppose your product is a system that accepts email requests for service. As soon as a request is received, it sends a reply that says "your request of 5/12/97 was accepted and its reference ID is NIC-051297-3". A tester who sends in many requests per day finds she has difficulty keeping track of which request goes with which ID. She wishes that the original request were appended to the acknowledgement. Furthermore, she realizes that some customers will also generate many requests per day, so would also appreciate this feature. Would she:
什么是重要的 bug?对谁而言是重要的?直观的估计,答案肯定是“对于客户”。听到这个定义,几乎每个人都会点头称是,但他们确实这样认为吗?这里要测试一下你们组织 的成熟度。假设你们的产品是一个接受电子邮件请求服务的系统。当收到请求时,它马上发送一个“您在97年5月12日发送的请求已经受理,参考ID是NIC -051297-3”的答复。一个每天发送很多请求的测试员发现要分清楚哪个请求与哪个ID对应是非常困难的。她希望最初的请求能够附加在确认邮件的后 面。并且,她意识到某些可户可能每天也会产生很多请求,所以会高度评价这个功能的。那么她将:
1. file a bug report documenting a usability problem, with the expectation that it will be assigned a reasonably high priority (because the fix is clearly useful to everyone, important to some users, and easy to do)?
写一个 bug 报告,记录一个可用性问题,希望能够分配一个合理的高优先级(因为这个修复很明显对每个人都很用,对有部分用户来说还非常重要,并且也容易修改)?
2. file a bug report with the expectation that it will be assigned "enhancement request" priority and disappear forever into the bug database?
写一个 bug 报告,希望它被分配为“功能提升请求”优先级并永远从 bug 数据库中消失?
3. file a bug report that yields a "works as designed" resolution code, perhaps with an email "nastygram" from a programmer or the development manager?
写一个 bug 报告,产生一个“按设计工作”解决码,可能还加上一个来自程序员或开发经理的“不同意”电子邮件?
4. not bother with a bug report because it would end up in cases (2) or (3)?
不打算费事去写 bug 报告,因为它将以情况(2)或(3)结束?
If usability problems are not considered valid bugs, your project defines the testing task too narrowly. Testers are restricted to checking whether the product does what was intended, not whether what was intended is useful. Customers do not care about the distinction, and testers shouldn't either.
如果可用性问题不认为是有效的 bug,那么你们的项目将测试任务定义得太狭窄了。测试员严格限制为检查产品是否按预期工作,而不管这种预期是否有效。客户不关心这个区别,测试员也不应该关心。
Testers are often the only people in the organization who use the system as heavily as an expert. They notice usability problems that experts will see. (Formal usability testing almost invariably concentrates on novice users.) Expert customers often don't report usability problems, because they've been trained to know it's not worth their time. Instead, they wait (in vain, perhaps) for a more usable product and switch to it. Testers can prevent that lost revenue.
测试员经常是组织中唯一像专家一样大量使用系统的人。他们会注意到专家会看到的可用性问题。(形式上的可用性测试几乎不可避免地集中于没有经验的用 户。)专家客户常常不会报告可用性问题,因为他们已经被训练的知道不值得花时间去这样做。相反,他们(也许是徒劳地)等待下一个可用的产品然后切换过去。 测试员可以避免这个损失。
While defining the purpose of testing as "finding bugs important to customers" is a step forward, it's more restrictive than I like. It means that there is no focus on an estimate of quality (and on the quality of that estimate). Consider these two situations for a product with five subsystems.
将测试的目的定义为“找到对用户重要的 bug ”是向前进了一步,但与我所喜欢定义相比仍有限制。这意味着没有集中于质量评估(以及这种评估的质量)。考虑一下测试含有五个子系统的产品的两种情况。
1. 100 bugs are found in subsystem 1 before release. (For simplicity, assume that all bugs are of the highest priority.) No bugs are found in the other subsystems. After release, no bugs are reported in subsystem 1, but 12 bugs are found in each of the other subsystems.
在发布前,在子系统1中找到了100个bug 。(为了简单起见,假设所有的 bug 都是最高级别的。)在其他子系统中没有发现 bug 。在发布后,在子系统1中没有报告 bug ,但在其他每个子系统中都报告了12个 bug 。
2. Before release, 50 bugs are found in subsystem 1. 6 bugs are found in each of the other subsystems. After release, 50 bugs are found in subsystem 1 and 6 bugs in each of the other subsystems.
在发布前,在子系统1中找到了50个 bug 。在其他每个子系统中都找到了6个 bug 。在发布后,在子系统1中报告了50个 bug ,在其他每个子系统中都报告了6个 bug。
From the "find important bugs" standpoint, the first testing effort was superior. It found 100 bugs before release, whereas the second found only 74. But I think you can make a strong case that the second effort is more useful in practical terms. Let me restate the two situations in terms of what a test manager might say before release:
从“找到重要 bug”的观点看,第1种测试情况较为理想。在发布前找到了100个 bug ,但是第2种情况中只找到74个。但我想你们可能会提出一个有力的理由认为第2中测试在实际中更有用。让我以产品发版前测试经理可能说些什么来重新描述一下两种测试情况:
1. "We have tested subsystem 1 very thoroughly, and we believe we've found almost all of the priority 1 bugs. Unfortunately, we don't know anything about the bugginess of the remaining five subsystems."
“我们全面测试了子系统1,我们相信已经找出了几乎所有优先级为1的 bug。不幸的是,我们对其他五个子系统的的 bug 一无所知。”
2. "We've tested all subsystems moderately thoroughly. Subsystem 1 is still very buggy. The other subsystems are about 1/10th as buggy, though we're sure bugs remain."
“我们比较全面地测试了所有的子系统。子系统1仍旧有不少 bug。其他子系统虽然还有 bug,但只有子系统1的 bug 的十分之一。”
This is, admittedly, an extreme example, but it demonstrates an important point. The project manager has a tough decision: would it be better to hold on to the product for more work, or should it be shipped now? Many factors - all rough estimates of possible futures - have to be weighed: Will a competitor beat us to release and tie up the market? Will dropping an unfinished feature to make it into a particular magazine's special "Java Development Environments" issue cause us to suffer in the review? Will critical customer X be more annoyed by a schedule slip or by a shaky product? Will the product be buggy enough that profits will be eaten up by support costs or, worse, a recall?
必须承认,这是一个极端的例子,但是证明了一个重要的观点。项目经理有一个艰难的决定:是延迟产品交付,再工作一段时间,还是现在就交付使用?许多 因素 ——都是一些大致的评估——都必须予以权衡:竞争对手会抢先发布产品并占领市场吗?如果丢掉一个未完工的功能部件会使得某一个杂志的 “Java 开发环境” 特别期刊的评论中对我们造成损害吗?关键客户 X 对产品延期和劣质产品哪一个更感到烦恼?产品是否有很多 bug,以至于支持成本会吃掉利润,或者更糟糕的是将产品召回?
The testing team will serve the project manager better if it concentrates first on providing estimates of product bugginess (reducing uncertainty), then on finding more of the bugs that are estimated to be there. That affects test planning, the topic of the next theme.
如果测试小组首先集中于产品错误的估计(减少不确定性),然后再找到更多的错误,他们会更好地服务于项目经理。这会影响测试计划。测试计划将在下个主题中论述。
It also affects status reporting. Test managers often err by reporting bug data without putting it into context. Without context, project management tends to focus on one graph: