<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss version="2.0"><channel><title>Developsense Blog</title><link>http://www.developsense.com/blog</link><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/DevelopsenseBlog" /><description>DevelopSense Blog</description><language>en</language><lastBuildDate>Sat, 12 May 2012 13:57:37 PDT</lastBuildDate><generator>WordPress http://wordpress.org/</generator><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="developsense/jdav" /><feedburner:browserFriendly xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">This is an XML content feed. It is intended to be viewed in a newsreader or syndicated to another site.</feedburner:browserFriendly><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="developsenseblog" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item><title>Oracles and The Right Answer</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/Ebnyh3xWsiw/</link><category>Uncategorized</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Tue, 08 May 2012 08:58:23 PDT</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1245</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p><em>In which <a href="http://www.developsense.com/blog/2012/04/heuristics-for-understanding-heuristics/" title="Heuristics for Understanding Heuristics">the conversation about heuristics and oracles</a> continues&#8230;</em></p>
<p>Tony&#8217;s brow furrowed as he spoke.  &#8220;No oracle comes with a guarantee that it&#8217;s giving you the right answer. That&#8217;s what you said. But surely there are some oracles that <em>are</em> reliable,&#8221; he said. &#8220;What about pure math?&#8221;</p>
<p>&#8220;<em>Pure</em> math? All right. Here&#8217;s an example: what&#8217;s 61 plus 45?&#8221;</p>
<p>&#8220;Duh. 106.&#8221;</p>
<p>&#8220;Well,&#8221; I said, &#8220;for many computer systems prior to the year 2000, if you added 45 to the year 61, you&#8217;d get 6. That is, if you looked at a printout or a screen, you&#8217;d expect to see &#8220;06&#8243; in the year field.  And for those systems, <em>that would have been the right answer</em>.&#8221;</p>
<p>&#8220;But that was wrong! Y2K was a problem. They called it &#8216;the Y2K <em>problem</em>&#8216;, didn&#8217;t they?&#8221;</p>
<p>&#8220;True,&#8221; I said. &#8220;But until the late &#8217;90s, it <em>wasn&#8217;t</em> a problem&mdash;or to be more accurate, people didn&#8217;t perceive it as a problem.  On the contrary, it was a <em>solution</em> to a problem:  memory and storage were expensive.  You could work around the &#8220;problem&#8221; with a combination of clever code and trust that people would interpret the output appropriately.  Remember, <a href="http://www.developsense.com/blog/2012/04/problems-with-problems/" title="Problems with Problems" target="_blank"><strong>a problem is a problem to some person at some time.</strong></a>  Programmers and designers in the 1960s had one set of problems to solve, and programmers at the end of the &#8217;90s had another set.  The point is that one oracle (regular math) would give you one right answer, and another oracle (what the programmers and designers wanted) would give you another.  Listen:  <strong>no oracle can give you <em>the</em> right answer</strong>.  An oracle can give you <em>a</em> right answer&mdash;a plausible answer that might be right for its context. But changing the context can flip that right answer into a wrong one&mdash;or a wrong answer into a right one.&#8221;</p>
<p>&#8220;Oracles are heuristic,&#8221; I continue.  &#8220;There&#8217;s this terrific book, <em><A HREF="http://www.amazon.com/Discussion-Method-Conducting-Engineering-Technology/dp/0195155998">Discussion of the Method</A></em>, by Billy Vaughan Koen. He&#8217;s an engineer, but he&#8217;s also a philosopher of engineering. In the book, he makes the argument that <em>all</em> decision-making, <em>all</em> problem-solving is heuristic.&#8221;</p>
<p>Tony looked quizzical.  &#8220;Wait&#8230; Even algorithms?  &#8216;Algorithm&#8217; is the opposite of &#8216;heuristic&#8217;&mdash;didn&#8217;t you say that?&#8221;</p>
<p>&#8220;Not exactly. Algorithms are robust; they tend to produce very reliable results.  But Koen says that even algorithms are heuristic.  After all, if you apply an algorithm in the wrong way, to solve the wrong problem, or in the wrong context, it will fail.&#8221;</p>
<p>&#8220;Aaargh,&#8221; Tony said. &#8220;Where does that leave us?  How can we ever know when a program&#8217;s correct?&#8221;</p>
<p>&#8220;That&#8217;s the interesting part,&#8221; I said. &#8220;We can&#8217;t.  <strong>A program can appear to be working in all kinds of ways, but the program and your oracles can fool you.</strong>. Think of a calculator program.  Yep: one plus one gives the answer &#8217;2&#8242;.  That&#8217;s looks correct to you, right?&#8221;</p>
<p>&#8220;Sure.&#8221;</p>
<p>&#8220;And yet if the calculator is in binary mode, the answer should be &#8217;10&#8242;.  <strong>You might be applying the wrong oracle for a given problem.</strong>  Even if the program isn&#8217;t in binary mode and &#8217;2&#8242; is right, the program could be tying up the processor so your machine is unusable.  Or the program gives you the right answer&mdash;in white text on a white background. Or the program clobbers the contents of the clipboard. And you don&#8217;t notice these things unless you&#8217;re looking for them, or unless you happen to notice them. <strong>That is, there might be a problem for which you don&#8217;t have an oracle.</strong>.&#8221;</p>
<p>&#8220;So <em>nothing</em> can tell us that a program&#8217;s working right?  We can&#8217;t ever tell whether a program is giving us the right answer?&#8221; Tony asked doubtfully. &#8220;That doesn&#8217;t sound&#8230; right.&#8221;</p>
<p>&#8220;Working right, yes, but only  in the sense that it appears to be fulfilling some requirement to some degree. <em>A</em> right answer, yes, but <em>the</em> right answer only in context, and not a complete answer. Correctness is a human notion, and things are only correct in some context. <strong>As testers, we can&#8217;t know <em>for sure</em> the deep truth about any observation.</strong> Any right answer that we see in computer software is only right <em>for now</em>, <em>this time</em>, <em>for some purpose</em>, <em>on this machine</em>.  <strong>We can&#8217;t reliably project our observations into the future.</strong> We can use an oracle to give us as a strong inference that the answer will be the same next time, but we don&#8217;t get a guarantee.  What we see might be right based on what we&#8217;re observing, but there&#8217;s all this stuff that what we&#8217;re not observing too. <A HREF="http://kaner.com/wp-content/uploads/2010/08/ExploratoryTestAutomationHoffman.pdf">Cem Kaner and Doug Hoffman describe that stuff really thoroughly</A>. You&#8217;ve heard that complete testing is impossible, right?&#8221;</p>
<p>&#8220;Of course.&#8221;</p>
<p>&#8220;Well, part of that is the coverage problem; we can&#8217;t test every possible input to a program in a finite amount of time.  But part of it is the oracle problem, too. <strong>We can&#8217;t see a problem unless we have an oracle for that problem: that is, a principle or mechanism for recognizing that problem.</strong>  All our oracles are heuristic, fallible&mdash;and in software, the potential for problems is limitless.&#8221;</p>
<p>&#8220;So how do we get around that?&#8221; Tony asked.</p>
<p>&#8220;The first thing is to recognize that <strong>oracles don&#8217;t give us <em>the</em> right answer, but every oracle may be able to point us to some problem.</strong>  Over the years, we we&#8217;ve studied oracles, we&#8217;ve come up with a bunch of principles and mechanisms for them, and we keep discovering more. <strong>Since there are infinite numbers of possible problems, we need a wide variety and diversity of oracles to spot them.</strong>  But there is one principle that seems to prevail overall.&#8221;</p>
<p>&#8220;What?&#8221;</p>
<p>&#8220;It seems to us that <strong>oracles are founded on the idea of <em>consistency</em></strong>.&#8221;</p>
<p><em>(to be continued)</em></p>
]]></content:encoded><description>In which the conversation about heuristics and oracles continues&amp;#8230; Tony&amp;#8217;s brow furrowed as he spoke. &amp;#8220;No oracle comes with a guarantee that it&amp;#8217;s giving you the right answer. That&amp;#8217;s what you said. But surely there are some oracles that are reliable,&amp;#8221; he said. &amp;#8220;What about pure math?&amp;#8221; &amp;#8220;Pure math? All right. Here&amp;#8217;s an example: what&amp;#8217;s [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">1</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/05/oracles-cant-guarantee/</feedburner:origLink></item><item><title>All Oracles Are Heuristic</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/r9rR2vifxng/</link><category>Uncategorized</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Wed, 25 Apr 2012 18:54:30 PDT</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1237</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p><em>In which <a href="http://www.developsense.com/blog/2012/04/heuristics-for-understanding-heuristics/" title="Heuristics for Understanding Heuristics">the conversation about heuristics and oracles</a> continues&#8230;</em></p>
<p>&#8220;So what&#8217;s the difference,&#8221; I asked my tester friend Tony, &#8220;between an oracle and a heuristic?&#8221;</p>
<p>&#8220;Hmm.  Well, I&#8217;ve read the <a href="http://www.developsense.com/courses.html" title="Rapid Software Testing">Rapid Testing</a> stuff, and you and James keep saying <strong>an oracle is a principle or mechanism by which we recognize a problem.</strong>&#8221;</p>
<p>&#8220;Yes,&#8221; I said.  &#8220;That&#8217;s what we call an oracle.  What&#8217;s the difference between that and a heuristic?&#8221;</p>
<p>&#8220;An oracle helps us recognize a problem, but it&#8217;s not a method for <em>solving</em> a problem, or for making a decision.&#8221;  He suddenly paused. </p>
<p>&#8220;Wait,&#8221; he said. &#8220;There&#8217;s that question you say testers should always be asking&mdash;<em>Is there a problem here?</em> An oracle <em>does</em> help us make a decision:  it helps us to decide whether there&#8217;s a problem in the product we&#8217;re testing.  And oracles can fail, too. So an oracle&#8217;s not different from a heuristic; an oracle <em>is</em> a heuristic.  They&#8217;re the same.&#8221;</p>
<p>&#8220;Okay,&#8221; I said.  &#8220;But that&#8217;s like saying &#8216;an iPhone isn&#8217;t different from a smartphone; an iPhone <em>is</em> a smartphone.  They&#8217;re the same.&#8217;&#8221;</p>
<p>&#8220;But?  But <em>what</em>?  What&#8217;s the problem with that?  Aren&#8217;t all iPhones smartphones?&#8221;</p>
<p>&#8220;Well, I&#8217;d say so,&#8221; I replied.  &#8220;But let me ask you:  are all smartphones iPhones?&#8221;</p>
<p>He paused for a second.  &#8220;Oooh.  <strong>Oracles are heuristic, but not all heuristics are oracles.</strong>  An oracle <em>is</em> a heuristic, but it&#8217;s <em>a specific kind</em> of heuristic.  Okay, let me see if I&#8217;ve got this:  tossing a coin is a heuristic for making a decision.  A heuristic <em>approach</em> for making a decision, I mean.  You&#8217;d use the Coin Toss heuristic in some contexts&mdash;random decisions, or unimportant decisions, or&#8230; or intractable decisions, or decisions that you want to be fair. The approach can fail. It might not be a fair coin. Or it might be a high-stakes decision that shouldn&#8217;t be left to chance. So the Coin Toss heuristic might work, it can fail.&#8221;</p>
<p>&#8220;Right,&#8221; I said.  &#8220;Tossing a coin is a heuristic approach for making a decision.&#8221;</p>
<p>&#8220;But it&#8217;s <em>not</em> an oracle,&#8221; Tony said, &#8220;because tossing a coin doesn&#8217;t help us to recognize a problem. So tossing a coin is a heuristic, but it&#8217;s not an oracle.&#8221;</p>
<p>&#8220;All right.  What does an oracle do for us?&#8221;</p>
<p>Tony started confidently.  &#8220;An oracle is something that gives us the right answer, so that we can compare it to the result the product gives us. If there&#8217;s a difference between the oracle&#8217;s answer and the product&#8217;s result, there&#8217;s a problem.  If the product&#8217;s answer is the same as the oracle&#8217;s answer, then there&#8217;s no problem.&#8221;</p>
<p>&#8220;Are you sure about that?&#8221; I asked.  &#8220;Is a specification an oracle?&#8221;</p>
<p>&#8220;Yes.  The specification tells us how the product is supposed to behave.&#8221;</p>
<p>&#8220;And how reliable are the specifications where you work?&#8221;</p>
<p>Tony paused, and then he grinned. &#8220;Okay. They suck, to be honest with you,&#8221; he said.  &#8220;They&#8217;re ambiguous.  They&#8217;re unclear.  They&#8217;re incomplete; they usually miss a bunch of requirements. They contradict each other, sometimes on the same page. So we have to talk about them a lot to clear them up&mdash;and then when we sort things out, the job of updating the written spec usually gets left for last, if it ever happens at all.&#8221;</p>
<p>&#8220;Still,&#8221; I said, &#8220;if you see an inconsistency between the spec and the product, you at least <em>suspect</em> a problem, don&#8217;t you?&#8221;</p>
<p>&#8220;Well, yeah.  When the spec and the product disagree, there&#8217;s usually a problem somewhere&mdash;either with the product, or with the spec.  Or both.  When we&#8217;re not sure, the program manager is usually the one who clears things up. Sometimes the programmers fix the product. Sometimes the the product turns out to be right, and it&#8217;s the spec that&#8217;s wrong&mdash;but then we know at least the BA&#8217;s ought to fix the spec, even if they don&#8217;t get around to it right away.&#8221;</p>
<p>&#8220;So if you use a specification as an oracle, it&#8217;s <em>somewhat</em> reliable, but it&#8217;s not <em>guaranteed</em> to be right.  What does that sound like?&#8221;</p>
<p>He paused again.  &#8220;It&#8217;s a heuristic.  An oracle is a special kind of heuristic.  <strong>An oracle is a <em>heuristic</em> principle or mechanism by which we recognize a problem.</strong>&#8221;</p>
<p>&#8220;That&#8217;s the way I like to say it these days, yes,&#8221; I replied.  &#8220;For one thing, having the word &#8216;heuristic&#8217; in the defintion of &#8216;oracle&#8217; seems to help people recognize that there&#8217;s <em>some</em> kind of distinction to be made between heuristics and oracles.  But for another, I think it&#8217;s important to emphasize that oracles help us to learn things.  And that, since they&#8217;re heuristics, <strong>oracles are fallible and context-dependent</strong>.  No oracle comes with a guarantee that it&#8217;s giving you the right answer.  <strong>An oracle can only point you to a possible problem.</strong>&#8221;</p>
<p>Tony&#8217;s brow furrowed again.</p>
<p><em>To be continued&#8230;</em></p>
]]></content:encoded><description>In which the conversation about heuristics and oracles continues&amp;#8230; &amp;#8220;So what&amp;#8217;s the difference,&amp;#8221; I asked my tester friend Tony, &amp;#8220;between an oracle and a heuristic?&amp;#8221; &amp;#8220;Hmm. Well, I&amp;#8217;ve read the Rapid Testing stuff, and you and James keep saying an oracle is a principle or mechanism by which we recognize a problem.&amp;#8221; &amp;#8220;Yes,&amp;#8221; I said. [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/04/all-oracles-are-heuristic/</feedburner:origLink></item><item><title>Heuristics for Understanding Heuristics</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/a0LV_zk8tjU/</link><category>Uncategorized</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Fri, 20 Apr 2012 08:24:28 PDT</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1224</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p><em>This conversation is fictitious, but it&#8217;s also representative of several chats that I&#8217;ve had with testers over the last few weeks.</em></p>
<p>Tony, a tester friend, approached me recently, and told me that he was having trouble understanding <em>heuristics</em> and <em>oracles</em>.  I have a heuristic approach for solving the problem of people not understanding a word: </p>
<p><strong>Give &rsquo;em a definition.</strong></p>
<p>So, I told him:</p>
<p><strong>A heuristic is a fallible method for solving a problem or making a decision.</strong></p>
<p>After I tried the &#8220;Give &rsquo;em a definition&#8221; heuristic, I tested to see if Tony seemed to understand.  His eyes were a little glazed over. I applied a heuristic for making the decision, <em>did he get it?</em>:</p>
<p><strong>When someone&#8217;s eyes glaze over, they don&#8217;t get it.</strong></p>
<p>Heuristics aren&#8217;t guaranteed to work.  For example, sometimes the general &#8220;Give &rsquo;em a definition&#8221; heuristic solves the problem of people not understanding something, and sometimes it doesn&#8217;t. In the latter case, I apply another heuristic:</p>
<p><strong>Give &rsquo;em an explanation.</strong></p>
<p>So I told him:</p>
<p>&#8220;When you <em>know</em> how to solve a problem, you might follow a rule.  When you&#8217;re not so sure about how to solve the problem, following a rule won&#8217;t help you.  Not knowing how to solve a problem means not knowing which rule to apply, or whether there&#8217;s a rule at all.  When you&#8217;re in uncertain conditions, or dealing with imperfect or incomplete information, you apply heuristics&mdash;methods that might work, or that might fail.</p>
<p>&#8220;As an adjective, &#8216;heuristic&#8217; means &#8216;serving to discover&#8217; or &#8216;helping to learn&#8217;.  When Archimedes realized that things that sink displace their volume of water, and things that float displace their mass, he ran naked through the streets of Athens yelling, &#8216;Eureka!&#8217; or &#8216;I&#8217;ve discovered it!&#8217; &#8216;Eureka&#8217; and &#8216;heuristic&#8217; come from the same root word in Greek.</p>
<p>Tony was listening thoughtfully, but his brow was still furrowed.  So I applied another teaching heuristic:</p>
<p><strong>Give &rsquo;em something to compare.</strong></p>
<p>I said, &#8220;Here&#8217;s one way of understanding heuristics:  compare &#8216;heuristic&#8217; with &#8216;algorithm&#8217;.  An algorithm is a method for solving a problem that&#8217;s <em>guaranteed</em> to have a right answer.  So an algorithm is like a <em>rule</em> that you follow; a heuristic is like a <em>rule of thumb</em> that you apply.  Rules of thumb usually work, but not always.&#8221;</p>
<p>Sometimes providing a comparable idea solves the problem of understanding something, and sometimes it doesn&#8217;t.  Tony nodded, but still looked a little puzzled.  I wasn&#8217;t sure I had solved the problem, so I applied a new heuristic: </p>
<p><strong>Point &rsquo;em to a book.</strong>  </p>
<p>I suggested that he read George Polya&#8217;s book <em>How to Solve It</em>.  &#8220;In that book, Polya presents a set of ideas and questions you can ask yourself that can help you to solve math problems.&#8221;</p>
<p>&#8220;Wait&#8230; I thought you always solved math problems with algorithms,&#8221; Tony said.</p>
<p>&#8220;That&#8217;s when you <em>know</em> how to solve the problem.  When you don&#8217;t, Polya&#8217;s suggestions&mdash;heuristics&mdash;can get you started.  They don&#8217;t always work, but they tend to be pretty powerful, and when one doesn&#8217;t work, you try another one.  You never know which questions or ideas will help you solve the problem most quickly.  So you practice this cycle: apply a heuristic, and if you;re still stuck, try another one.  After a while, you develop judgement and skill, which is what you need to apply heuristics well.  Polya talks about that a lot.  He also emphasizes just how much heuristics are fallible and context-dependent.&#8221;</p>
<p>Mind you, neither Tony nor I had a copy of Polya&#8217;s book right handy, and Tony wanted to understand &#8220;heuristics&#8221; better <em>now</em>. The &#8220;point &rsquo;em to a book&#8221; heuristic had failed this time, even though it might have worked in a different context.  So I tried yet another heuristic to solve the problem: </p>
<p><strong>Point &rsquo;em to <em>another</em> book.</strong></p>
<p>I suggested that he read <em>Gut Feelings</em> by Gerd Gigerenzer.  &#8220;In that book, Gigerenzer emphasizes that heuristics tend to be fast and frugal (that is, quick and inexpensive).  That&#8217;s important, he says:  humans need heuristics because they&#8217;re typically dealing with <em>bounded rationality</em>.&#8221;</p>
<p>Uh-oh.  Tony&#8217;s eyes had glazed over again at the mention of &#8220;bounded rationality&#8221;.  So I applied a heuristic: </p>
<p><strong>Even when it&#8217;s a deep concept, a fast and frugal explanation might do.</strong>  </p>
<p>After all, Polya says that a heuristic isn&#8217;t intended to be perfect, that it&#8217;s provisional and context-dependent.  So in order to provide a quick understanding of &#8220;bounded rationality&#8221;, I said, &#8220;In a nutshell, bounded rationality is a situation when you have incomplete knowledge, imperfect understanding, and limited time.&#8221;</p>
<p>He grinned, and said, &#8220;What, like when you&#8217;re testing?  Like most of the time <em>in life</em>?&#8221;</p>
<p>&#8220;Yes.  Billy Vaughan Koen, in another book, Discussion of the Method, says that the engineering method is &#8216;to cause the best change in a poorly understood situation within the available resources.&#8217;&#8221;</p>
<p>&#8220;So he&#8217;s saying that engineers apply heuristics?&#8221; Tony asked.  &#8220;I guess that makes sense, since engineers solve problems in ways that usually work, but sometimes there are failures.&#8221;</p>
<p>He seemed to be getting it.  But I wanted to test that, so I applied a heuristic for making the decision, &#8220;Does he get it?&#8221;:  </p>
<p><strong>Ask the student to provide an example.</strong></p>
<p>So I said, &#8220;I think you might have it.  But can you provide me with an example of a heuristic?&#8221;</p>
<p>He said, &#8220;Okay.  I think so.&#8221;  He paused. &#8220;Here&#8217;s a heuristic for solving the problem of opening a door:  <strong>&#8216;Pull on the handle; push on the plate.&#8217;</strong>  That&#8217;s what you do when you get to a door, right?  It&#8217;s a heuristic that <em>usually</em> works.  Well&#8230; it might fail.  It could be one of those annoying doors that have handles on both sides, where you have to push the handle <em>or</em> pull the handle to open the door.  It might be one of those doors that opens both ways, like the doors for restaurant kitchens, so there&#8217;s no handle.  The door might not even <em>have</em> a handle or a plate; it might have a knob.  In that case, you apply another heuristic:  &#8216;Turn the knob&#8217;.  That&#8217;s a solution for the problem of opening a door that doesn&#8217;t have a handle or a plate.  But <em>that</em> heuristic might fail too.  The door might be locked, even though the knob turns.  It might be one of those fancy doors that have dead-bolt locks and knobs that don&#8217;t turn.  It might not have a knob at all; it might have one of those old-fashioned latches.  So none of those heuristics <em>guarantees</em> a solution, but each one might help to solve the problem of getting through the door.&#8221;</p>
<p>&#8220;Great! I think you&#8217;ve got it.&#8221;</p>
<p>&#8220;To be precise about it,&#8221; he said, &#8220;you can&#8217;t be sure, so you&#8217;re <em>applying heuristics that help you to make the decision</em> that I get it.&#8221;</p>
<p>I laughed.  &#8220;Right. So what&#8217;s the difference,&#8221; I asked, &#8220;between an oracle and a heuristic?&#8221;</p>
<p>He paused.</p>
<p><em>(to be continued&#8230;)</em></p>
]]></content:encoded><description>This conversation is fictitious, but it&amp;#8217;s also representative of several chats that I&amp;#8217;ve had with testers over the last few weeks. Tony, a tester friend, approached me recently, and told me that he was having trouble understanding heuristics and oracles. I have a heuristic approach for solving the problem of people not understanding a word: [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">9</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/04/heuristics-for-understanding-heuristics/</feedburner:origLink></item><item><title>Problems with Problems</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/3rrqkj7DvYc/</link><category>Uncategorized</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Tue, 24 Apr 2012 22:17:20 PDT</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1218</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>People sometimes seem to struggle with a concept that&#8217;s central to testing, the concept of &#8220;oracle&#8221;.  In the three-day <A HREF="http://www.developsense.com/courses.html">Rapid Software Testing</A> class, we define an oracle as </p>
<p><strong>a principle or mechanism by which we recognize a problem.</strong></p>
<p>Sometimes I like to emphasize that oracles are fallible and context-dependent.  When that&#8217;s so, I say that an oracle is</p>
<p><strong>a <em>heuristic</em> principle or mechanism by which we recognize a problem.</strong></p>
<p>That means that an oracle can work <em>but might fail</em> in helping us to recognize a problem.  A heuristic is something that helps us learn; it&#8217;s a fallible method for solving a problem or making a decision.  So an oracle, as a special kind of heuristic, helps us to make a particular decision in answer to the question, &#8220;Is there a problem here?&#8221;  If the answer is Yes, it&#8217;s because an oracle is enabling us to recognize a problem.</p>
<p>For some time, I&#8217;ve been surprised at how tricky a concept this seems to be for some people.  Possibly, I thought, it was because people simply weren&#8217;t used to thinking about oracles; for many it&#8217;s a new word and a new concept.  Maybe the problem was with &#8220;heuristic&#8221;, or &#8220;principle&#8221;, or &#8220;mechanism&#8221;.  Recently, though, I began to wonder:  perhaps the difficulty comes from the fact that people aren&#8217;t used to thinking deeply about <em>problems</em>.  I tested that idea by asking some people, and I found that they struggled in describing how they thought of problems.  So what is a problem, anyway?</p>
<p>Here are two ways of thinking about problems that I&#8217;ve found useful.</p>
<p>1) A problem is <strong>&#8220;a difference between what is perceived and what is desired.</strong>&#8221; (<em>Dewey, J. (1933), How We Think: A Restatement of the Relation of Reflective Thinking to the Educative Process</em>) I first learned this definition of &#8220;problem&#8221; in a session led by Don Gray at the <A HREF="http://www.ayeconference.com">Amplifying Your Effecctiveness Conference</A> a few years ago, and I believe it shows up in several places in <A HREF="http://www.geraldmweinberg.com">Jerry Weinberg</A>&#8216;s work.</p>
<p>2) A problem is &#8220;<strong>an undesirable situation that is significant to and maybe solvable by some agent, though probably with some difficulty.</strong>&#8221; (<em>G.F Smith, &#8220;Towards a Heuristic Theory of Problem Structuring&#8221;, quoted in Weick, Karl E. </em>Sensemaking in Organizations<em>. Sage Publications, Inc, 1995</em>.) </p>
<p>Both definitions, I think, point to the same thing:  problems are based on desires, perceptions, and situations.  But both definitions have a minor problem for me, which is that they do not make explicit something I think to be very important:  desires, perceptions, and situations are centred around <em>people</em> and <em>context</em>.  Problems are subject to <a href="http://www.developsense.com/blog/2010/09/done-the-relative-rule-and-the-unsettling-rule/" title="Done, The Relative Rule, and The Unsettling Rule" target="_blank">the Relative Rule</a>:  </p>
<p>For any abstract X, X is X <em>to some person</em>, <em>at some time</em>.</p>
<p>People often talk of problems (bugs, defects, issues, and so forth) as though they were attributes of a product or of a situation.  But problems aren&#8217;t attributes as such.  They&#8217;re <em>relationships</em> between some person(s), the product, and the system that includes them, and the situation that encompasses it all.  A problem is a problem <em>for some person</em>, <em>at some time</em>. Differences, perceptions, and desires are like that too; they&#8217;re relative.  If I were to expand out Dewey&#8217;s notion of &#8220;problem&#8221;, it would look like this:</p>
<p><em>A problem is a difference (according to some person) between what is perceived (by some person) and what is desired (by some person) (all at some point in time).</em></p>
<p>There are several implications here.  </p>
<p><UL><LI>The problem might be in the difference, or in the perception, or in the desire.</LI><br />
<LI>Different people will have different notions of differences, different perceptions, different desires. </LI><br />
<LI>A problem may also be influenced by timing; something that&#8217;s a problem today might not be a problem tomorrow, and something that isn&#8217;t a problem today might be a catastrophe tomorrow.</LI><br />
<LI>Since, as testers, we&#8217;re in the business of finding potential problems, we must be alert to the enormous variety of people who may be affected by the product and the project.</LI><br />
<LI>Our mission as testers typically includes the task of identifying possible problems.  As such, we should resist the temptation to dismiss something as &#8220;not a problem&#8221;, because&#8230;</LI><br />
<LI>Something that we might dismiss as a trivial problem for some people might be a serious problem for other people and&#8230;</LI><br />
<LI>Something that we might see as a serious problem in our perception might appear as a trivial problem for other people so&#8230;</LI><br />
<LI>We&#8217;re obliged as testers to identify a possible problem in terms of its most serious consequences for people who might matter to the product owner. However&#8230;</LI><br />
<LI>We&#8217;re not in the business of deciding whether something is a problem or not, so for a given problem, our testing clients or the project owners decide on its ultimate significance, and on their response to it.</LI></UL></p>
<p>So the bottom line is that problems are slippery.  A problem is not a thing in a product or in the world, but a relationship between some situation and some person.</p>
<p>Smith&#8217;s emphasis on an ability to solve a problem may matter too.  People often accept things that they perceive as beyond anyone&#8217;s ability to solve.  For example, I can&#8217;t do anything about a meteor hitting my computer in the next few minutes, so I&#8217;m not going to treat the possibility that it might happen as a problem.  Yet again, I should be careful to suppress any of my own prejudices that nothing can be done with respect to a given problem.  That&#8217;s a decision for those who build and those who own the product, since they may have information and more resources of which I am unaware.     </p>
<p>One of the fundamental questions of testing is &#8220;Is there a problem here?&#8221;  Oracles are <A HREF="http://www.developsense.com/blog/2007/06/mcluhan-thinking-for-testers/">media, in the McLuhan sense</A>.  Media are tools, extensions of ourselves that enhance, enable, accelerate, or intensify our capacity to do things.  McLuhan pointed out, though, that media are agnostic about what they extend.  If our concept of &#8220;problem&#8221; is limited, our oracles will extend and accelerate our ability to recognize a limited set of problems.  In other words, with too narrow an idea of what a problem might be, oracles may only help us to recognize too narrow a set of possible problems.</p>
<p>So:  one key to understanding and applying oracles skilfully is to recognize the richness and diversity of what we might mean by <em>problem</em>.  What we&#8217;re testing is not simply source code or a collection of compiled object code files.  We&#8217;re testing products or services that are <em>systems</em>, sets of things in meaningful relationship to each other.  Those systems are related to other systems.  Products don&#8217;t stand on their own.  Every product is part of a system that includes people who build it, people who support it, people who maintain it, people who buy it, people who use it directly, and people who are affected by it.  That&#8217;s an incomplete list of people who might experience problems related to the product or system.  Try asking these questions:</p>
<p><UL><LI>What are the elements of system that I&#8217;m testing?</LI><br />
<LI>Who might have a relationship to that system, or to its elements?</LI><br />
<LI>Who <em>else</em> might have a relationship to it?</LI><br />
<LI>What desires might they have that are related to the system?</LI><br />
<LI>What aspects of the system might provide value to those people by fulfilling their desires?  What might threaten that value by dashing their desires?</LI><br />
<LI>How might people perceive the system?  What might they see, hear, smell, touch, taste, or feel as they use it or interact with it?</LI><br />
<LI>What might influence, magnify, sharpen, or distort their perceptions?</LI><br />
<LI>What differences might they experience between their perceptions and their desires?</LI><br />
<LI>Could someone, or something&mdash;some change in the situation&mdash;help them to solve or otherwise deal with such differences?</LI><br />
<LI>What might change in the system over time?  How might people&#8217;s perceptions, desires, or notions of differences change over time?</LI><br />
<LI>What ideas, tools, or conversations might help me to recognize perceptions, desires, and differences?</LI></UL></p>
<p>Reflecting on these questions periodically, even briefly, may help you to expand your notion of what a problem is.  That in turn may extend your ability to use oracles to recognize potentional problems in the system you&#8217;re testing.</p>
<p><em>This blog post was inspired and sharpened by conversations with <A HREF="http://mavericktester.com/">Anne-Marie Charrett</A>, <A HREF="http://rhythmoftesting.blogspot.ca/">Peter Walen</A>, and <A HREF="http://www.satisfice.com">James Bach</A>.  Thanks to them.</em></p>
]]></content:encoded><description>People sometimes seem to struggle with a concept that&amp;#8217;s central to testing, the concept of &amp;#8220;oracle&amp;#8221;. In the three-day Rapid Software Testing class, we define an oracle as a principle or mechanism by which we recognize a problem. Sometimes I like to emphasize that oracles are fallible and context-dependent. When that&amp;#8217;s so, I say that [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/04/problems-with-problems/</feedburner:origLink></item><item><title>Three Conferences of Note</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/8Vi7DUYJBfA/</link><category>Uncategorized</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Sat, 24 Mar 2012 13:54:43 PDT</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1207</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>A strong, active, local testing community is a wonderful thing. Kitchener-Waterloo, a hub of software and software-related companies small and large, has such a community, and an active association in the <a href="http://www.kwsqa.org" title="Kitchener Waterloo Software Quality Association" target="_blank">KWSQA</a>.  Each year, the KWSQA presents the <a href="http://kwsqa.org/conference/" title="Targeting Quality Conference" target="_blank">Targeting Quality conference</a>, which I&#8217;ve had the pleasure of participating in for a few years.  Alas, this year I can&#8217;t make it&mdash;but maybe you can.  This year&#8217;s presenters and workshop leaders are Henrik Andersson, Paul Carvalho, Fiona Charles, Selena Delesie, Adam Goucher, John Hazel, Paul Holland, Griffin Jones, Jason Little, Kevin Malley, Neil Price-Jones, and Declan Whelan.  The conference happens April 23-24 in Waterloo, Ontario&mdash;within easy reach of Toronto.</p>
<p>A strong, active, national testing community is also a wonderful thing. Sweden, for example, is a hotbed of committed, energetic, and thoughtful testers, who this year inaugurate the <a href="http://lets-test.com/" title="Let's Test conference" target="_blank">Let&#8217;s Test</a> conference which runs May 7-9 in Stockholm, Sweden.  While the founding community is local, the scope is global, with a slate of speakers and tutorial presenters coming from Australia, Belgium, Canada, Denmark, Estonia, Germany (dang&mdash;broke the alphabetic run; where are the French?), Japan, the Netherlands, Romania, South Africa, Sweden, the United Kingdom, and the United States.  I&#8217;m honoured to be giving a keynote and a half-day tutorial at Let&#8217;s Test.  Each session&#8217;s timeslot consists of 2/3  presentation and 1/3 facilitated discussion, clearly inspired by the <a href="http://www.associationforsoftwaretesting.org/conference/" title="CAST Protocol" target="_blank">CAST protocol</a>.</p>
<p>And speaking of CAST&#8230; A strong, active, worldwide testing community is a wonderful thing too.  This year, the Conference for the <a href="http://www.associationforsoftwaretesting.org" title="Association for Software Testing" target="_blank">Association for Software Testing</a> (<a href="http://www.associationforsoftwaretesting.org/conference/cast-2012/" title="CAST 2012" target="_blank">CAST 2012</a>) will be held July 16-18 in San Jose, CA.  I&#8217;ll be there for that one too.  Doug Hoffman (conference chair) and Fiona Charles (program chair,  along with Anne-Marie Charrett, Sherry Heinze, Johan Jonasson on the program committee, plus Matt Heusser and Pete Walen managing the Emerging Topics track) have put together a terrific agenda.  There are too many compelling speakers, sessions, and workshops to list here, but <a href="http://www.associationforsoftwaretesting.org/conference/cast-2012/schedule/" title="Schedule for CAST 2012" target="_blank">have a look at the program yourself</a>.</p>
]]></content:encoded><description>A strong, active, local testing community is a wonderful thing. Kitchener-Waterloo, a hub of software and software-related companies small and large, has such a community, and an active association in the KWSQA. Each year, the KWSQA presents the Targeting Quality conference, which I&amp;#8217;ve had the pleasure of participating in for a few years. Alas, this [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">0</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/03/three-conferences/</feedburner:origLink></item><item><title>I Might Be Wrong (But Not For Me)</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/yyQHzEpVKMw/</link><category>Context</category><category>Tester Skill</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Mon, 02 Apr 2012 12:54:12 PDT</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1198</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p><em>Jerry Weinberg tells a story (yes, it&#8217;s me; I&#8217;m telling </em>yet another<em> Jerry Weinberg story) of meeting an old friend who looked distraught. </p>
<p>&#8220;What&#8217;s the matter?&#8221; Jerry asked.</p>
<p>The fellow replied, &#8220;Well, I&#8217;m kind of shellshocked. My wife just left me.&#8221; </p>
<p>&#8220;Was that a surprise?&#8221;</p>
<p>&#8220;Yes, it really was,&#8221; the fellow said. &#8220;I mean, we had had some problems, but I thought they were all settled.&#8221; </p>
<p>Jerry paused for a moment. Then he said, &#8220;nothing is ever settled.&#8221;</p>
<p>Several years after hearing that story I recognized its power as a general systems law. Obviously, I didn&#8217;t discover it, but I did name it. I call it &#8220;The Unsettling Rule&#8221;:  <strong>Nothing is ever settled.</strong></em></p>
<p>In <A HREF="http://www.amazon.com/Lessons-Learned-Software-Testing-Kaner/dp/0471081124">Lessons Learned in Software Testing</A> by Kaner, Bach, and Pettichord, Lesson 145 is &#8220;Use the IEEE Standard 829 for Test Documentation&#8221;.  Lesson 146, on the facing page, is &#8220;Don&#8217;t Use the IEEE Standard 829&#8243;. When the book was published, some reviewers said &#8220;What&#8217;s the problem with these guys? They can&#8217;t even get it together to tell a consistent story!&#8221;  Others, including me, thought that this pair of pages in particular was wonderful.  It underscored the degree to which issues in the world of software testing are not settled, the degree to which our craft is a long dialogue in which there are many voices to be heard, many options to be discussed, and many contexts be considered.</p>
<p>The difference between the context-driven school (or approach; there&#8217;s now apparently disagreement between whether it&#8217;s a school or an approach!) and other school/approaches is that these disagreements can get aired in public.  There are some fundamental principles on which we agree, and there are some other things on which we don&#8217;t agree. Whatever else happens, in this community, we try to make sure that there&#8217;s no <em>fake</em> consensus.  This is alarming and disturbing, sometimes, to some people, and it can be stressful to the participants.  But when it comes up, it&#8217;s a hallmark of our community that we try to deal with it.  It helps to keep us sharp, and it helps to keep us honest.</p>
<p>Recently I wrote <a href="http://www.developsense.com/blog/2012/02/why-pass-vs-fail-rates-are-unethical/" title="Why Pass vs. Fail Rates Are Unethical (Test Reporting Part 1)">a blog post</a> in which I took the position that the often-used pass-vs.-fail ratio is an invalid and misleading measurement.  To summarize the post, I said, &#8220;At best, if everyone ignores it entirely, it’s simply playing with numbers. Otherwise, producing a pass/fail ratio is irresponsible, unethical, and unprofessional&#8230; The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception. Reducing the product story to a number means reducing its relationship with people to a number. By extension, that means reducing people to numbers too. So to irresponsible, unethical, and unprofessional, we can add unscientific and inhumane.&#8221;</p>
<p>I recognize that, coming from someone who claims to be context-driven, that&#8217;s pretty extreme stuff.  Yet, in its form, it&#8217;s consistent with one of those pages or the other in <em>Lessons Learned in Software Testing</em> (with some omissions, which I&#8217;ll address shortly). It is also consistent with a set of principles that James Bach and I espouse as part in our <A HREF="http://www.developsense.com/courses.html">Rapid Software Testing</A> class:</p>
<p><em><strong>We will not knowingly or negligently mislead our clients and colleagues.</strong> This ethical premise drives a lot of the structure of Rapid Software Testing. Testers are frequently the target of well-meaning but unreasonable or ignorant requests by their clients. We may be asked to suppress bad news, to create test documentation that we have no intention of using, or to produce invalid metrics to measure progress. We must politely but firmly resist such requests unless, in our judgment, they serve the better interests of our clients. At minimum we must advise our clients of the impact of any task or mode of working that prevents us from testing, or creates a false impression of the testing.</em></p>
<p>To me, that statement is both in tension with <em>and</em> consistent with several of the principles of the context-driven school, the first and second (&#8220;The value of any practice depends on its context&#8221; and &#8220;There are good practices in context, but there are no best practices&#8221;) and the seventh (&#8220;Only through judgment and skill, exercised cooperatively throughout the entire project, are we able to do the right things at the right times to effectively test our products.&#8221;)</p>
<p>Pass-vs.-fail ratios, to me, fly in the face of one of the &#8220;principles in action&#8221; listed at <A HREF="http://www.context-driven-testing.com">http://www.context-driven-testing.com</A>:  &#8220;Metrics that are not valid are dangerous.&#8221;  </p>
<p><a href="http://context-driven-testing.com/?p=38" title="Contexts Differ" target="_blank">Cem Kaner disagrees</a> with the position expressed in my post.  It seems to me that Cem&#8217;s disagreement hangs on the degree of danger and our reactions to it.  I hold that in practical contexts, pass-vs.-fail ratios so dangerous that for almost all cases, they cross over the line into &#8220;unethical:, like giving the car keys to someone who is obviously drunk, or like planting land mines near a community well, even though in some rare contexts, such things could be done in good faith and without harm. Cem&#8217;s position seems to be (and I welcome correction, if it&#8217;s warranted) that although pass-vs.-fail ratios are exemplary of <em>dangerous</em> metrics, they&#8217;re not <em>unethical</em>.</p>
<p>Let&#8217;s start with two points that I&#8217;d like to make about the &#8220;unethical&#8221; label. One is that my ethical sense is personal, and so are the views posted on my blog.  Although I&#8217;m happy when other people share them, unless otherwise stated, I don&#8217;t represent the view of any community, including my own.  I don&#8217;t make claims to universal ethics.  Second, Cem refers to &#8220;using the accusation of unethical as a way of shutting down discussion of whether an idea (unethical!) was any good or not.&#8221;  I&#8217;m not using it that way.  I have no intention whatsoever of shutting down debate (as if I could in any case!).  Unless claimed otherwise, I am stating personal principles; not Right and Wrong, but right and wrong <em>for me</em>. I don&#8217;t know  of any agency (other than society) who can make claims of Right or Wrong, and even then claims seem always context-specific.</p>
<p>Whether pass-vs.-fail ratios are wrong or Wrong, they&#8217;re certainly wrong <em>for me</em>, wrong enough that I&#8217;m uncomfortable with using them on the job.  I&#8217;m sufficiently uncomfortable that I&#8217;m usually going to decline to provide them, just as I would not accept a job in which I was obliged to shoot people.  Other people might choose to become mercenaries or to go to war for their countries; I&#8217;d be a conscientious objector.  That wrongness is relative too, of course.  It&#8217;s subject to the Relative Rule; that any abstract X is X to some person, at some time. I can only warrant my own ethical stance <em>for the moment</em>.  My position on some issues has changed over the years, courtesy of some pleasant and unpleasant experiences.  I&#8217;m not currently aware of things that might cause my stand to change in the future, but I have to leave the possibility open.</p>
<p>So, is providing pass vs. fail rates <em>unethical</em>?  On reflection, I have to say reluctantly, yeah, I think so; not absolutely, but in most practical circumstances.  For me, the crucial test is in the last of Cem&#8217;s questions about ethics:  &#8220;Are you helping someone else lie, cheat, steal, intimidate, or cause harm?&#8221;  My answer is that I see a great deal of risk&mdash;and admittedly risk is only <em>potential</em> harm&mdash;that I will be aiding the client in some form of oppression or deception, either to himself or to his superiors. (The latter is a situation that I have been in before, with pass-vs.-fail ratios at the centre of the story in a project associated with a $33 million dollar loss.)  Most of the time, providing pass-vs.-fail ratios is a test activity that I would stop immediately, using the <A HREF="http://www.developsense.com/blog/2009/10/when-do-we-stop-testing-one-more-sure/">&#8220;mission rejected&#8221; stopping heuristic</A> (one that I hadn&#8217;t noted until Cem himself pointed it out).  </p>
<p>Cem doesn&#8217;t provide any contexts in which pass-vs.-fail ratios might be useful, but as a context-driven tester, it&#8217;s my obligation to accept his critique and his challenge, and consider some contexts in which I <em>might</em> use them.  (This is the omission from my post post that I mentioned above, and it&#8217;s the way that the controversy was handled in Lessons Learned:  with a serving of context)  I present them in order from the least plausible to the most plausible.  </p>
<p><strong>&#8220;Your daughter will die&#8221; or &#8220;we&#8217;ll shoot this dog.&#8221;</strong>  If someone employs a threat of harm to some person or being or something of value, I have to evaluate the relative damage afforded by providing the measure or not. </p>
<p><strong>When mandated by force of law.</strong>  If I were on the witness stand, and a lawyer asked me, &#8220;What were the pass-vs.-fail ratios at release time for this project,&#8221; I&#8217;d be required by law to respond.  I can imagine a likely it would play out, too:  &#8220;92.7%, but I&#8217;d also like to make it clear that&mdash;&#8221; &#8220;No further questions, Your Honour.&#8221;</p>
<p><strong>If I provided the data with all of the appropriate disclaimers AND I was sure that the disclaimer would be heard.</strong>  If the client (and the client&#8217;s client, and so forth) were to relay the data <em>and</em> the disclaimer reliably to the point where the data would be used, I might be persuaded to provide the data.  But I&#8217;d have to weigh that against the risk that I was wrong about the disclaimer being heard. Moreover, in my professional judgement, it would be wasting my client(s)&#8217;s time.</p>
<p><strong>As a placebo.</strong>  I might give a pass vs. fail ratio long enough to convince my client that it&#8217;s not helpful or necessary, while doing other things to test well and provide her with other forms of reliable information.  I&#8217;d remain pretty uncomfortable with dispensing the sugar pills, though, and would work at ways of getting around it.  </p>
<p><strong>In the course of demonstrating that pass-vs.-fail ratios are a bad idea.</strong>  In some contexts, pass-vs.-fail ratios provide what Kirk and Miller call <em>quixotic reliability</em>.  That is, the measurement seems to correlate with other measurements of the state of the project.  I might provide pass-vs.-fail ratios long enough to show a divergence between that data and other measures of project or product health.</p>
<p><strong>If I were aware that the person receiving the data was in possession of all the contextual information that I believe they needed to put it to appropriate and non-harmful use.</strong>  We use this in one of the exercises in our class, based on a bug from an actual product.  We present a very specific set of tests that are the same in every material way but for two variables.  The total domain space to put these variables in combination is a set with 2304 elements.  When used in a test that covers all of these elements, 510 provide a &#8220;fail&#8221; result.  All of the test cases are of the same kind, and our students knows that those test cases are comparable for the purposes that they&#8217;re considering.  In that case, <em>that kind</em> of ratio in <em>that kind</em> of context has some value in describing <em>that kind</em> of coverage.  So there might be some pedagogical or rhetorical value to reporting a pass-vs.-fail ratio there.  Interestingly, the root of the problem is a data type problem in a single line of code.  That helps to illuminate the discussion of &#8220;one bug or 510?&#8221; which in turn illuminates how bug counts and failure counts aren&#8217;t well correlated.  It also helps to illuminate opportunity cost in paying overmuch attention to this problem when there are many other things that we <em>might</em> test.</p>
<p>To me, the real challenge is in coming up with a case in which this invalid, dangerous metric <em>in its most common applications</em> might be used for good.  In the contexts where they&#8217;re commonly discussed and used&mdash;overwhelmingly commonly, in my view&mdash;pass-vs.-fail ratios are used to express the quality of testing, the health of the project, or the readiness of the product. In those contexts, the risk of misuse, whether intentional or inadvertent, is high&mdash;like placing a loaded gun with the safety off in a crowded subway car.  As I&#8217;ve heard Cem say before, &#8220;I&#8217;d like to call them an Industry Worst Practice, but being context-driven, I can&#8217;t.&#8221;  Once again, Cem has reminded me of why I can&#8217;t commit to the &#8220;unethical&#8221; charge absolutely and in all cases.  He&#8217;s provided me with a challenge and an opportunity to sharpen my analysis, and I thank him for that.</p>
<hr />
<p><strong>Postscript, March 28, 2012:</strong>  In private correspondence and conversation, Cem suggested a different interpretation of a paragraph from <a href="http://www.developsense.com/blog/2012/02/why-pass-vs-fail-rates-are-unethical/" title="Why Pass vs. Fail Rates Are Unethical (Test Reporting Part 1)" target="_blank">this post</a> that I quoted above to provide context for this post.  In order to ward off that interpretation, here&#8217;s how I might write that paragraph today:  </p>
<p>&#8220;The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception. Reducing the product story to <em>this invalid</em> number <em>without additional information</em> means reducing the product&#8217;s relationship with people to  <em>this invalid</em> number. By extension <em>when this invalid number is being used to evaluate people</em>, that means reducing people to <em>this invalid</em> number too.  So to irresponsible, unethical, and unprofessional, <em>in this case</em> we could add unscientific and inhumane.&#8221;</p>
<p>To be clear: these two posts have not been a blanket condemnation of all measurement, but of a particular metric that fails spectacularly when subjected to the tests of construct validity and reasonable and foreseeable side effects in Kaner and Bond&#8217;s <a href="http://www.kaner.com/pdfs/metrics2004.pdf" title="Software Engineering Metrics:  What Do They Measure and How Do We Know?" target="_blank">Software Engineering Metrics:  What Do They Measure and How Do We Know?</a>.  Pass vs. fail is not an imperfect metric; this is a metric that has no discernable construct validity to me (or even to Cem).  I&#8217;ve both experienced and seen pain and systematic deception with this metric at the centre of it.  In this, it&#8217;s not like imperfect financial figures that are generated by legitimate companies subject to scrutiny by regulators, by auditors, by shareholders, and by markets.  It&#8217;s more like financial forecasting data dreamed up by Bernie Madoff.  I don&#8217;t mind dealing with imperfect but plausibly valid information; that&#8217;s all a tester ever gets to do, really.  But if Bernie Madoff were to ask me to lend my credibility to his models, data, or business practices, I&#8217;d feel personally bound to decline that particular request.</p>
]]></content:encoded><description>Jerry Weinberg tells a story (yes, it&amp;#8217;s me; I&amp;#8217;m telling yet another Jerry Weinberg story) of meeting an old friend who looked distraught. &amp;#8220;What&amp;#8217;s the matter?&amp;#8221; Jerry asked. The fellow replied, &amp;#8220;Well, I&amp;#8217;m kind of shellshocked. My wife just left me.&amp;#8221; &amp;#8220;Was that a surprise?&amp;#8221; &amp;#8220;Yes, it really was,&amp;#8221; the fellow said. &amp;#8220;I mean, we [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/03/i-might-be-wrong/</feedburner:origLink></item><item><title>Delivering the News (Test Reporting Part 3)</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/diZRQFcfNBE/</link><category>Accountability</category><category>Management</category><category>Test Framing</category><category>Testing Story</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Thu, 01 Mar 2012 16:58:53 PST</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1181</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>In <a href="http://www.developsense.com/blog/2012/02/braiding-the-stories/" title="Braiding The Stories (Test Reporting Part 2)" target="_blank">the last post in this series</a>, I noted some potentially useful structual similarities between bug reports (whether oral or written) and newspaper reports.  This time, I&#8217;ll delve into that a little more.</p>
<p><strong>To our clients, investigative problem reports are usually the most important part of the product story.</strong> The most respected newspapers don&#8217;t earn their reputations by reprinting press releases; they earn their reputations through investigative journalism.  As testers (or, heaven help us, <a href="http://www.developsense.com/blog/2010/05/testers-get-out-of-the-quality-assurance-business/" title="Testers:  Get Out of the Quality Assurance Business" target="_blank">quality assurance</a>) people, we tend to be chartered to look for problems, and to investigate them in ways that are most helpful to programmers, managers, and our other clients.  A failing test on its own tells us little, and a failing check even less; as I pointed out <A HREF="http://www.developsense.com/blog/2012/02/why-pass-vs-fail-rates-are-unethical/">here</A>, a failing test is only an allegation of a problem. Investigation and study of a failing test is likely to inform us of something more useful: whether someone will perceive a problem that threatens the value of the product. I&#8217;ll talk more about the nature of problems in a later post, but for now, think of a product problem in terms of a perceived absence of or threat to some dimension of quality. (See the <A HREF="http://www.satisfice.com/tools/satisfice-tsm-4p.pdf">Heuristic Test Strategy Model</A> for one list of quality criteria; see <A HREF="http://thetesteye.com/posters/TheTestEye_SoftwareQualityCharacteristics.pdf">Software Quality Characteristics</A>, by Rikard Edgren, Henrik Emilsson and Martin Jansson for another.) Since the manager&#8217;s goal is generally to release a product at her desired level of quality, problems that could threaten that goal are likely to be interesting and important.  Or, as they say in the newspaper business, &#8220;if it bleeds, it leads&#8221;.</p>
<p><strong>Potential showstoppers are usually the most important stories.</strong> In the 1990s, I was a technical support person, a tester, a program manager, and a programmer for a mass-market, commercial shrink-wrap software company.  Since we had millions of customers, even minor problems could have a big impact on technical support and on the reputation of our products in the market.  The market was enormous, hardware and software were even less standardized than they are now, and we worked under a great deal of time pressure. Classifying and prioritizing problems was contentious. One of the important classification questions was &#8220;What should we consider a showstopper?&#8221;  One of the senior programmers came up with an answer that I&#8217;ve used ever since:</p>
<p><em>Showstopper (n.):  Something that makes more sense to fix than to ship.</em></p>
<p>(I talked about showstoppers <A HREF="http://www.developsense.com/blog/2010/08/469/">here</A>.)  In a development project, a showstopper&mdash;any threat to the timely release of the project&mdash;is a page-one, above-the-fold story.</p>
<p><strong>There&#8217;s always one story that leads.</strong>  The most important threat to a timely, successful release may be a single problem, or it may be a collection of problems&mdash;what Ian Mitroff calls a <em>mess</em>. Do we have a problem, or a couple of problems, or a mess?  No matter what the answer, there&#8217;s only so much space on the front page above the fold.  Will you have one headline, or two, or three?  What will that headline say?  What will the lead paragraph of each story look like?  Does the lead paragraph cover the five Ws&mdash;who, what, where, when, and why?  If not, arethose questions answered shortly?  Might there be a reasonable reason not to answer them?</p>
<p><strong>There&#8217;s only one front page, and there&#8217;s almost always more than one story on it.</strong>  Our clients need to be able to absorb the lead story and the other front-page stories quickly, so we need to be able to provide headlines, lead paragraphs, and details in appropriate proportions.  See <A HREF="http://www.satisfice.com/articles/mpim-report.pdf">an example front page here</A>, with details that follow.  </p>
<p>Very infrequently, serious newspapers give their entire front page to a story. In those cases, it&#8217;s usually an overwhelmingly important story, or one that threatens the newspaper or journalism itself.</p>
<p><strong>The most compelling stories are those that have an impact on people.</strong> Although product problems are often technical in nature, the &#8220;making sense&#8221; part of the showstopper decision is focused on the business.  Testers must to be able to connect technical problems with business risk.  Problems related to technical correctness are often easy to describe, but they might not be important. The skill of <A HREF="http://www.associationforsoftwaretesting.org/training/courses/bug-advocacy/">bug advocacy</A>&mdash;making sure that the customer is aware of the best possible motivations for fixing the bug&mdash;depends on your ability to report the bug in terms of its most significant effect on the business. Ben Simo has a lovely way to sum this up.  Early in his career, when Ben was trying to advocate a bug fix, his project manager said, &#8220;Revenue is king. Liability is queen. Tell me how it impacts them.&#8221;</p>
<p><strong>The number of stories usually isn&#8217;t as important as the significance of the stories.</strong> This is another way in which test reports can be like newspapers.  We don&#8217;t usually evaluate the quality of a newspaper by the <em>number</em> of stories in it. Instead, we look at the <em>significance, relevance, and credibility</em> of the stories.  </p>
<p><strong>It may take time to distinguish between a breaking story and a major story.</strong>  Sometimes the news cycle doesn&#8217;t afford time for investigation, even though the story might be important.  Information gets passed around the project at various moments during the test and development cycle.  Sometimes a discovery happens just before a meeting.  Smart reporters know to balance urgency and restraint when there&#8217;s a breaking story.  When I worked in commercial mass-market software in the 1990s, we sometimes discovered a terrible-looking problem a couple of hours before release. Such discoveries would trigger <em>arousal</em> (no, not sexual friskiness, but arousal in the psychological sense of being suddenly snapped awake and alert to danger).  All of a sudden, we&#8217;d be noticing all kinds of things that we hadn&#8217;t noticed before, and most of them were non-problems of one kind or another.  We were biased by fear.  We called it the &#8220;snakes on everything&#8221; moment. When reporting, testers need to take stock of the emotional factors surrounding them, and report cautiously and accurate.  An hour from now, an allegation might be an important story, or it might be nothing.</p>
<p><strong>Non-problems aren&#8217;t news.</strong> There&#8217;s a pattern of stories in the first section of the newspaper: they&#8217;re mostly stories about problems, and there&#8217;s a reason for that: problems compel attention.  Our emotional systems evolved to help keep us out of trouble. Problems or threats trigger arousal.  Things that are going well are nice to hear about, but they don&#8217;t engage emotions in the same way as problems do.  In a software development project, non-problems have relatively little significance for project managers.  Routine daily successes don&#8217;t threaten the project, and therefore need less attention.</p>
<p><strong>Numbers, like pictures, are illustrations, not the whole story.</strong>  A qualitative report is not quantity-free; after all, identifying the presence or absence of something involves counting to one, and the degree of some attribute of interest can be illustrated by number.  But just as a pictorial illustration isn&#8217;t the item it depicts, a numerical illustration isn&#8217;t the story it might help to describe.  A picture looks a part of a scene through a particular lens; a number focuses on one attribute using a particular metric. Each one may emphasizes some observations at the expense of other observations.  Each one may crop out detail.  Each one may magnify or distort.  </p>
<p><strong>Since the product and testing stories are multi-dimensional, be prepared to show the dimensions.</strong>  Newspapers reports always have a bias, but reporters and editors often attempt to manage the bias by providing alternative sources of information, and alternative interpretations.  A story of any length often includes multiple stories, or multiple threads of the main story. When tables of data are appropriate, newspapers print tables (think stock quotes in the business section, or box scores or line scores in sports). Products, coverage, quality, and problems are all multi-dimensional, multi-variate, and qualitative. Where there&#8217;s a mass of data, consider using tables such as <A HREF="http://www.satisfice.com/presentations/dashboard.pdf">dashboards</A> or <A HREF="http://www.satisfice.com/sbtm/index.shtml">coverage tables</A>.  Pin numbers to reliable measurements (see the slip charts, the detailed impact case methods, and the subjective impact methods in Weinberg&#8217;s <A HREF="http://www.amazon.com/Quality-Software-Management-First-Order-Measurement/dp/0932633242">Quality Software Management, Volume 2:  First Order Measurement</A>; and pay attention to validity&mdash;see Kirk and Miller&#8217;s <A HREF="http://www.amazon.com/Reliability-Validity-Qualitative-Research-Methods/dp/0803924704">Reliability and Validity in Qualitative Research</A> and Shadish, Cook, and Campbell&#8217;s <A HREF="http://www.amazon.com/Experimental-Quasi-Experimental-Designs-Generalized-Inference/dp/0395615569">Experimental and Quasi-Experimental Designs for Generalized Causal Inference</A>).</p>
<p><strong>Describe your coverage.</strong> Boris Beizer described coverage as &#8220;any metric of test completeness with respect to a test selection criterion&#8221;.  That suggests that it is possible to quantify coverage if you have a quantifiable test selection criterion.  For example, if a single-digit field accepts any digit from 0 to 9, one could select 10 tests and claim complete coverage based on that criterion.  Mind, that data coverage doesn&#8217;t account for flow or sequence coverage; suppose that a bug was triggered only when a 7 replaced a 3 in that field. Since the overall number of possible tests is infinite, test selection criteria are based on models. In practical terms, this means that overall test coverage is some finite number over an infinite number.  If you report that accurately, you&#8217;re stuck with a number that remains asymptotically close to zero. Instead, focus on a qualitative put coverage on an ordinal scale.  Level 0 means &#8220;We know nothing about this area of the product.&#8221;  Use Level 1 to say &#8220;We have done smoke or sanity testing; at this point, we&#8217;ve determined whether the product is even stable enough for serious testing.&#8221;  Level 2 means &#8220;we&#8217;ve tested the common, the core, the critical, the happy path; our testing has been focused on <em>can</em> it work.&#8221;  Level 3 means &#8220;We&#8217;ve tested the harsh, the complex, the challenging, the extreme, the exceptional; if there were a serious problem, we&#8217;d probably know about it by now.&#8221;  In this system, the numbers are barely more than labels for qualitative evaluation, so don&#8217;t be tempted to do serious math with them.</p>
]]></content:encoded><description>In the last post in this series, I noted some potentially useful structual similarities between bug reports (whether oral or written) and newspaper reports. This time, I&amp;#8217;ll delve into that a little more. To our clients, investigative problem reports are usually the most important part of the product story. The most respected newspapers don&amp;#8217;t earn [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">4</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/02/delivering-the-news-test-reporting-part-3/</feedburner:origLink></item><item><title>Braiding The Stories (Test Reporting Part 2)</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/QWCedmEejx8/</link><category>Accountability</category><category>Management</category><category>Test Framing</category><category>Testing Story</category><category>Uncategorized</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Wed, 07 Mar 2012 15:42:54 PST</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1171</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p><em>We were in the middle of a testing exercise at the <a href="http://www.ayeconference.com/" title="Amplifying Your Effectiveness" target="_blank">Amplifying Your Effectiveness conference</a> in 2005.  I was assisting <a href="http://www.satisfice.com" title="James Bach" target="_blank">James Bach</a> in a workshop that he was leading on testing.  He presented the group with a mysterious application written by <a href="http://www.workroom-productions.com" title="James Lyndsay" target="_blank">James Lyndsay</a>&mdash;an early version of one of the <a href="http://www.workroom-productions.com/black_box_machines.html" title="Black Box Test Machines" target="_blank">Black Box Test Machines</a>.  &#8220;How many test cases would you need to test this application?&#8221; he asked.</p>
<p>Just then <a href="http://www.geraldmweinberg.com/Site/Home.html" title="Jerry Weinberg's Web Site" target="_blank">Jerry Weinberg</a> wandered into the room. &#8220;Ah! Jerry Weinberg!&#8221; said James. &#8220;One of the greatest testing experts in the world! <strong>He&#8217;ll</strong> know the answer to this one.  How many test cases would you need to test this application, Jerry?&#8221;</p>
<p>Jerry looked at the screen for a moment.  &#8220;Three,&#8221; he said, firmly and decisively.  </p>
<p>James knew to play along.  &#8220;<strong>Three?!</strong>&#8220;, he said, in a feigned combination of amazement, uncertainty, and curiosity. &#8220;How do you <strong>know</strong> it&#8217;s three?  Is it really <strong>three</strong>, Jerry?&#8221;</p>
<p>&#8220;Yes,&#8221; said Jerry. &#8220;Three.&#8221; He paused, and then said drily, &#8220;Why? Were you expecting some other number?&#8221;</em></p>
<p>In <a href="http://www.developsense.com/blog/2012/02/why-pass-vs-fail-rates-are-unethical/" title="Why Pass vs. Fail Rates Are Unethical" target="_blank">yesterday&#8217;s post</a>, I was harshly critical of pass vs. fail ratios, a very problematic yet startlingly common way of estimating the state of the product and the project.  When I point out the mischief of pass vs. fail ratios, some people object. &#8220;In the real world,&#8221; they say, &#8220;we have to report pass vs. fail ratios to our managers, because that&#8217;s what they want.&#8221;  Yet bogus reporting is antithetical to the &#8220;real world&#8221;.  Pass vs. fail ratios come from the the <em>fake</em> world, a world where numbers have magical properties to soothe troubled and uncertain souls.  Still, there&#8217;s no question that managers want <em>something</em>. It&#8217;s our mandate to give them something of value. </p>
<p>Some people say that managers want numbers because they want to know that we&#8217;re measuring. I&#8217;ve found two ways of thinking about measurement that have been very useful to me. One is the definition from Kaner and Bond&#8217;s splendid paper &#8220;<A HREF="http://www.kaner.com/pdfs/metrics2004.pdf">Software Engineering Metrics: What Do They Measure and How Do We Know</A>?&#8221;: &#8220;Measurement is the empirical, objective assignment of numbers, according to a rule derived from a model or theory, to attributes of objects or events with the intent of describing them.&#8221;  I think that&#8217;s a superb definition of quantitative measurement, and the paper includes a set of probing questions to test the validity of a quantitative measurement.  Pass vs. fail ratios fall down badly when they&#8217;re subjected to those tests.  </p>
<p>Jerry Weinberg offers another definition of measurement that I think is more in line with what managers really want:  &#8220;Measurement is the art and science of making reliable (and significant) observations.&#8221;  (The main part of the definition comes from <A HREF="http://www.amazon.com/Quality-Software-Management-First-Order-Measurement/dp/0932633242">Quality Software Management, Vol. 2: First-Order Measurement</A>; the parenthetical comes from recent correspondence over Twitter.)  That&#8217;s a more general, inclusive definition.  It incorporates Kaner and Bond&#8217;s notion of quantitative measurement, but it&#8217;s more welcoming to qualitative, first-order approaches.  First-order measurement, as Jerry describes it, provides answers to questions like &#8220;What seems to be happening? and What should I do now?&#8221;  It entails a minimum of fuss, and tends to be direct, unobtrusive, inexpensive, and qualitative, leading either to immediate action or a decision to seek more information. It&#8217;s a common, misleading, and often expensive mistake in software development to leap over first-order measurement and reporting in favour of second-order&mdash;less direct, more quantified, more abstract, and based on more elaborate and vulnerable models.</p>
<p>My experience, as a tester, a programmer, a program manager, and a consultant, tells me that to manage a project well, you need a good deal of immediate and significant information.  &#8220;Immediate&#8221; here doesn&#8217;t only mean timely; it also means <em>unmediated</em>, without a bunch of stuff getting in between you and the observation.  In particular, managers need to know about problems that threaten the value of the product and the on-time, successful completion of the project.  That knowledge requires more than abstract data; it requires <em>information</em>.  So, as testers, how can we inform the decision-makers?  <strong>We must learn to describe and report on the product, our testing, and the quality of our testing.</strong>  This involves constructing, editing, narrating, and justifying a story in three lines that weave around each other like a braid.  Each line is its own story.</p>
<p><strong>Tell the product story.</strong>  The product story is a qualitative report on how the product <em>can</em> work, how it fails, and how it <em>might</em> fail in ways that matter to our clients.  &#8220;Working&#8221;, &#8220;failure&#8221;, and &#8220;what matters&#8221; are all qualitative evaluations.  Quality is value to some person; in a business setting, quality is value to some person who matters to the business.  A qualitative report about a product requires us to relate the nature of the product, the people who matter, and the presence or absence of value, risks, and problems for those people. Qualitative information makes it possible for our clients to make informed decisions about quality.</p>
<p><strong>To make the product story credible, tell the testing story.</strong> The testing story is about how we configured, operated, observed, and evaluated the product; what we actually did and what we actually saw. The testing story gives <em>warrant</em> to the product story; it helps our clients understand why they should believe and trust the product story we&#8217;re giving. The testing story is centred around the coverage that we obtained and the oracles that we applied.  Coverage is the extent to which we&#8217;ve tested the program; it&#8217;s about where we&#8217;ve looked and how we&#8217;ve looked, and it&#8217;s also about what&#8217;s uncovered&mdash;where we might not have looked yet, and where we don&#8217;t intend to look.  Oracles are central to evaluation; they&#8217;re the principles and mechanisms that allow us to recognize a problem.  The product story will likely feature problems in the product; the testing story, where necessary, includes an account of how we knew they were problems, for whom they would be problems, and inferences about how serious the problems it might be.  We can make inferences about the significance of problems, but not ultimate conclusions, since the decision of what matters and what constitutes a problem lies with the product owner.  The product story and our clients&#8217; reactions to it will influence the ongoing testing story, and vice versa.</p>
<p><strong>To make the testing story credible, tell a story about the quality of the testing.</strong>  Just as the product story needs warrant, so too does the testing story.  To tell a story about the quality of testing requires us to describe why the testing we&#8217;ve done has been good enough, and why the testing we haven&#8217;t done hasn&#8217;t been so important so far.  The quality-of-testing story includes details on what made testing harder or slower, what made the product more or less testable, what the risks and costs of testing are, and what we might need or recommend in order to provide better, more accurate, more timely information.  The quality-of-testing story will shape and be shaped by the other two stories.</p>
<p><strong>Develop skills to tell and frame stories.</strong> People sometimes justify presenting invalid numbers in lieu of stories by saying that numbers are &#8220;efficient&#8221;. I think they mean &#8220;fast&#8221;, since efficiency of communication depends not only on speed, but also on value, relevance, validity, and the level of detail your client needs. In order to frame stories appropriately and hit the right level of detail&#8230;</p>
<p><strong>Don&#8217;t think data feed; think the daily news.</strong> Testing is like investigative journalism, researching and delivering stories to people. The newspaper business knows how to direct attention efficiently to the stories in which we&#8217;re interested, such that we get the level of detail that we seek.  Some of those strategies include:</p>
<p><UL>
<li><strong>Headlines.</strong>  A quick glance over each page tells us immediately what, in the editors&#8217; judgement, are the most salient aspects of any given story.  Headlines come in different sizes, relative to the editors&#8217; assessment of the importance of the story.</li>
<li><strong>Front page.</strong>  The paper comes folded.  The stories that the paper deems most important to its reader are on the front page, above the fold. Other important stories are on the front page below the fold. The page is laid out to direct our attention to what we find most relevant, and to allow us to focus and refocus on items of interest.</li>
<li><strong>Continuation.</strong> When an entire story is too long to fit on the front page, it&#8217;s abbreviated and the story continues elsewhere.  This gives the reader the option of following the story or looking at other items on the front page.</li>
<li><strong>Coverage areas.</strong> The newspaper is organized into sections (hard news, business, sports, life and leisure, arts, real estate, cars, travel, and so forth).  Each section comes with its own front page, which generally includes headlines and continuations of its own.</li>
<li><strong>Structured storytelling.</strong> Newspaper stories tend to be organized in spiralling levels of detail, such that the story is set up to follow the <a href="http://en.wikipedia.org/wiki/Inverted_pyramid" title="Inverted Pyramid" target="_blank">inverted pyramid</a> (the link is well worth reading). The story typically begins with the most newsworthy information, usually immediately  addressing the five W questions&mdash;who, what where, why, and when, plus how&mdash;and the the story builds from there.  The key is that the reader can absorb information to the level of detail she seeks, continuing to the end of the story or jumping out when she&#8217;s satisfied.</li>
<li><strong>Identifying who is involved and who is affected.</strong>  Reporters and editors contextualize their stories.  Just as in testing, people are the most important element of the context.  A story is far more compelling when it affects the reader or people that the reader cares about.  A good story often helps to clarify why the reader <em>should</em> care.</li>
<li><strong>Varying approaches to delivering information.</strong> Newspapers often use a picture to helps to illustrate or emphasize an important aspect of a story.  In the business or sports sections, where quantitative data is often crucial, information may be organized in tables, or trends may be illustrated with charts. Notice that the stories&mdash;first-order reports&mdash;are always given greater prominence than the tables of stock quotes league standings, and line scores.</li>
<li><strong>Sidebars.</strong> Some stories are illuminated by background information that might break the flow of the main story. That information is presented in parallel; in another thread, as we might say.</li>
<li><strong>Daily (and in the world of the Web, continuous) delivery of information.</strong>  My newspaper arrives at a regular time each day, a sort of daily heartbeat for the news cycle.  The paper&#8217;s Web site is updated on a continuous basis.  Information is available both on a supply and a demand basis; both when I expect it and when I seek it.</li>
<li><strong>Identifiable sources.</strong> Well-researched stories gain credibility by identifying how, where, when, and from whom the information was obtained.  This helps to set up degrees of trust and skepticism in the reader.</li>
<p></UL></p>
<p>One important note:  These approaches apply to more than text. Testers need to extend these patterns not only to written or mechanical forms, but to oral discourse.</p>
<p>I&#8217;ll have more suggestions and additional parallels between test reporting and newspapers in the next post in this series.</p>
]]></content:encoded><description>We were in the middle of a testing exercise at the Amplifying Your Effectiveness conference in 2005. I was assisting James Bach in a workshop that he was leading on testing. He presented the group with a mysterious application written by James Lyndsay&amp;#8212;an early version of one of the Black Box Test Machines. &amp;#8220;How many [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">8</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/02/braiding-the-stories/</feedburner:origLink></item><item><title>Why Pass vs. Fail Rates Are Unethical (Test Reporting Part 1)</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/g1NUmJQEdhw/</link><category>Accountability</category><category>Measurement</category><category>Tester Skill</category><category>Testing Story</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Tue, 06 Mar 2012 06:49:43 PST</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1154</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Calculating a ratio of passing tests to failing tests is a relatively easy task.  If it is used as a means of estimating the state of a development project, though, the ratio is invalid, irrelevant, and misleading.  At best, if everyone ignores it entirely, it&#8217;s simply playing with numbers.  Otherwise, producing a pass/fail ratio is irresponsible, unethical, and unprofessional. </p>
<p><strong>A passing test is no guarantee that the product is working correctly or reliably.</strong>  Instead, a passing test is an observation that the program <em>appeared</em> to work correctly, under some set of conditions that we were conscious of (and many that we weren&#8217;t), using a selection of specific inputs (and not using the rest of an essentially infinite set), at some time (to which we will never return), on some machine (that was in a particular state at that time; we observed and understood only a fraction of that state), based on a handful of things that we were looking at (and a boatload of things that we weren&#8217;t looking at, not that we&#8217;d have any idea where or how to look for everything).  At best, a passing test is <em>a rumour of success</em>. Take any of the parameters above, change <em>one bit</em>, and we could have had a failing test instead.</p>
<p><strong>Meanwhile, a failing test is no guarantee of a failure in the product we&#8217;re testing.</strong> Someone may have misunderstood a requirement, and turned that misunderstanding into an inappropriate test procedure.  Someone may have understood the requirement comprehensively, and erred in establishing the test procedure; someone else may have erred in following it. The platform on which we&#8217;re testing may be misconfigured, or there may be something wrong with something on the system, such that our failing test points to that problem and is not an indicator of a problem in our product.  If the test was being assisted by automation, perhaps there was a bug in the automation.  Our test tools may be misconfigured such that they&#8217;re not doing what we think they&#8217;re doing.  When generating data, we may have misclassified invalid data as valid, or vice versa, and not noticed it.  We may have inadvertently entered the wrong data. The timing of the test may be off, such that system was not ready for the input we provided. There may be an as-yet-understood reason why the product is providing a result which seems incorrect to us, but which is in fact correct. A failing test is <em>an allegation of failure</em>.</p>
<p>When we do the math based on these assumptions, the unit of measurement in which pass/fail rates are expressed is rumours over allegations.  <em>Is this a credible unit of measurement?</em></p>
<p><strong>Neither rumours nor allegations are things.</strong> Uncertainties are not units with a valid natural scale against which they can be measured. One entity that we call a &#8220;test case&#8221;, whether passing or failing, may consist of a single operation, observation and decision rule.  Another entity called &#8220;test case&#8221; may consist of hundreds or thousands or millions of operations, all invisible, with thousands of opportunities for a tester to observe problems based not only on explicit knowledge, but also on tacit knowledge.  Measuring while failing to account for clear differences between entities demolishes the <a href="http://en.wikipedia.org/wiki/Construct_validity" title="Construct Validity" target="_blank">construct validity</a> of the measurement.  Treating test cases&mdash;whether passing or failing&mdash;as though they were countable objects is a classic case of the <a href="http://en.wikipedia.org/wiki/Reification_%28fallacy%29" title="Reification (fallacy)" target="_blank">reification fallacy</a>.  Aggregating scale-free, reified (non-)entities loses information about each instance, and loses information about any relationships between them.  Some number of rumours doesn&#8217;t tell us anything about the meaning, significance, or value of any given passing tests, nor does the aggregate tell us anything about coverage that the passing tests provide, nor does the number doesn&#8217;t tell us about missing coverage.  Some number of allegations of which we&#8217;re aware doesn&#8217;t tell us anything about the seriousness of those allegations, nor does it tell us about undiscovered allegations. Dividing one invalid number by another invalid doesn&#8217;t mean the invalidity cancels and produces a valid ratio.  </p>
<p>When the student has got an answer wrong, and the student is misinformed, there&#8217;s a problem. What does the <em>number</em> of questions that the teacher asked have to do with it?  When a manager interviews a candidate for a job, and halfway through the interview he suddenly starts shouting obscenities at her, will the <em>number</em> of questions the manager asked have to do anything to do with her hiring decision?  If the battery on the Tesla Roadster is ever completely drained, the car <a href="http://theunderstatement.com/post/18030062041/its-a-brick-tesla-motors-devastating-design" title="“It’s A Brick” – Tesla Motors’ Devastating Design Problem" target="_blank">turns into a brick</a> with a $40,000 bill attached to it.  Does anyone, anywhere, care about the number of passing tests that were done on the car?</p>
<p>If we are asked to produce pass/fail ratios, I would argue that it&#8217;s our professional responsibility to politely refuse to do it, and to explain why:  we should not be offering our clients the service of self-deception and illusion, nor should our client accept those services.  The ratio of passing test cases to failing test cases is at best irrelevant, and more often a systemic means of self- and organizational deception.  Reducing the product story to a number means reducing its relationship with people to a number. By extension, that means reducing people to numbers too. So to irresponsible, unethical, and unprofessional, we can add unscientific and inhumane. </p>
<p>So what&#8217;s the alternative?  We&#8217;ll get to that <a href="http://www.developsense.com/blog/2012/02/braiding-the-stories/" title="Braiding The Stories (Test Reporting Part 2)">tomorrow</a>.</p>
]]></content:encoded><description>Calculating a ratio of passing tests to failing tests is a relatively easy task. If it is used as a means of estimating the state of a development project, though, the ratio is invalid, irrelevant, and misleading. At best, if everyone ignores it entirely, it&amp;#8217;s simply playing with numbers. Otherwise, producing a pass/fail ratio is [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">18</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/02/why-pass-vs-fail-rates-are-unethical/</feedburner:origLink></item><item><title>Do Not Close This Window (Or Click The Back Button)</title><link>http://feedproxy.google.com/~r/developsense/JDAV/~3/BJ42RkdXod4/</link><category>Uncategorized</category><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Michael</dc:creator><pubDate>Tue, 07 Feb 2012 08:48:47 PST</pubDate><guid isPermaLink="false">http://www.developsense.com/blog/?p=1143</guid><content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Here&#8217;s a classic case of poor design and user experience. Most of us have seen something like it. It happened to my wife yesterday. It will happen to you again soon, probably.</p>
<p><UL><LI>You&#8217;re making an online payment for some product or service.</LI><br />
<LI>You press a button that says something like &#8220;Submit Payment&#8221;.</LI><br />
<LI>A web page appears that says something like &#8220;Your payment is being submitted. Please do not close this window or click the Back button on your browser.&#8221;  And that&#8217;s <em>all</em> the page says.</LI><br />
<LI>The page stays on your screen <em>forever</em>&mdash;or until you wince and close the browser window despite the specific instructions on the screen.</LI></UL></p>
<p>Here are some questions that a tester could ask when presented with this design, or with this experience:</p>
<p><UL><LI><strong>&#8220;Or else what?&#8221;</strong> &#8220;Please do not close this window or click the Back button on your browser.&#8221; Or else what?  What Bad Thing might happen? What Good Thing might fail to happen?  This should lead directly to&#8230;</LI><br />
<LI><strong>&#8220;What if&#8230;?&#8221;</strong> What if the sequence of actions doesn&#8217;t go as planned? What if a conversation between a server and a client is interrupted? (Note: the connections between any two systems are at best <em>somewhat</em> reliable.  If you believe otherwise, a travelling testing consultant has two words for you:  <em>hotel WiFi</em>.)  At what points might interruptions happen (quick answer: all of them.) How is the state of the conversation being managed?  Have we considered interruptions in our design? Have we tested for them?  How does the system handle and recover from delayed or interrupted transactions?</LI><br />
<LI><strong>&#8220;What should the customer reasonably expect?&#8221;</strong>  It&#8217;s not hard to imagine a good deal of variance in the performance of a system, especially when its end nodes might be dozens of network hops apart from each other. Still, how long should a customer reasonably expect the transaction to take?  At what point might it make sense for the customer to bail out?</LI><br />
<LI><strong>&#8220;How would the customer know when it&#8217;s time to bail out?&#8221;</strong>  If you can put a message on the screen, and if you know how long it would be reasonable to wait before bailing out, should the customer have to look at her watch? Might a countdown timer be helpful?</LI><br />
<LI><strong>&#8220;Is there another way?&#8221;</strong> Is there another way for the customer to see that the transaction has completed successfully, or has failed?  Does your design and the message you display make that option clear?</LI><br />
<LI><strong>&#8220;What emotions might come up?&#8221;</strong>  How might a customer feel uncertain, confused, frustrated, annoyed, mystified, impatient, surprised, helpless&mdash;or confident, impressed, reassured, or delighted&mdash;by what she sees and experiences?  How might we use those potential feelings to help us guide our search for problems?</LI><br />
<LI><strong>&#8220;Who can help?&#8221;</strong> If the transaction fails, who can help the customer out?  How does the customer get in touch with that person?  Is there a means of contacting customer support on that &#8220;Please wait&#8230;&#8221; screen? </LI><br />
<LI><strong>&#8220;What meta-information is available?&#8221;</strong>  I&#8217;ve worked with companies that have said, &#8220;We can&#8217;t put a customer support telephone number on that screen; customer support would be swamped!&#8221;  What does that statement tell you about the system, about people&#8217;s impressions of its reliability, and about risk?</LI><br />
<LI><strong>&#8220;How do we raise awareness of problems?&#8221;</strong>  When a transaction on our site fails or is subject to an unreasonable delay, how do we get to find out?  Is someone alerted immediately?  Are failures aggregated?  Buried in a log file somewhere?  Who looks for problems, and how often do they look?  Who hears about problems? How does that information get relayed to the people who design, maintain, and update the system? How might that information&mdash;or parts of it&mdash;<em>not</em> get relayed to those people?</LI></UL></p>
<p>This last question is important. Its answer provides part of the explanation for the fact that, after fifteen years of Web commerce, we&#8217;re still seeing designs like the one that appears at the top of this post.</p>
]]></content:encoded><description>Here&amp;#8217;s a classic case of poor design and user experience. Most of us have seen something like it. It happened to my wife yesterday. It will happen to you again soon, probably. You&amp;#8217;re making an online payment for some product or service. You press a button that says something like &amp;#8220;Submit Payment&amp;#8221;. A web page [...]</description><thr:total xmlns:thr="http://purl.org/syndication/thread/1.0">7</thr:total><feedburner:origLink xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0">http://www.developsense.com/blog/2012/02/do-not-close-this-window/</feedburner:origLink></item></channel></rss>

