<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
		<channel>
		<title>Measuring Usability: Quantitative Usability and Statistics</title>
		<link>http://www.measuringusability.com</link>
		<description>Articles, advice and calculators for measuring the usability of websites and applications.</description>
		<copyright>(c) 2010, Measuring Usability LLC All rights reserved.</copyright>
		<language>en-us</language>
		<webMaster>jeff@measuringusability.com(Jeff Sauro)</webMaster>
		<image>
			<title>Measuring Usability: Quantitative Usability and Statistics</title>
			<url>http://www.measuringusability.com/images/logo6.gif</url>
			<link>http://www.measuringusability.com</link>
		</image>
		

		
				<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/MeasuringUsability" /><feedburner:info uri="measuringusability" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><item>
				<title>Five Techniques for Moderating Usability Tests</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/FGQZpUcr1G4/moderating-tips.php</link>
				<description>&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/therapy.jpg" title="Usability Therapy" border="0" height="241" width="345"&gt;It doesn't matter if it's your first usability test or your hundredth; there are always things you can improve to make the most of the time with your users.&amp;nbsp; &lt;br&gt;&lt;br&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;Avoid using &lt;i&gt;why &lt;/i&gt;in a direct, reflexive manner&lt;/b&gt;: We of course want to know why users do things on websites and in applications. &lt;br&gt;&lt;br&gt;But when we ask &lt;i&gt;why &lt;/i&gt;directly, we risk putting the participant on the defensive.&amp;nbsp; &lt;br&gt;&lt;br&gt;We attempt to soften the probing &lt;i&gt;why &lt;/i&gt;questions by using gentler but more verbose questions:&lt;br&gt;&lt;br&gt;&lt;ul&gt;&lt;li&gt;Instead of "Why did you click on that icon?" ask "Can you tell me what about the icon led you to click on it?"&lt;/li&gt;&lt;li&gt;Instead of, "Why did you choose number 6?" ask "Can you briefly explain why you chose a 6 when deciding to recommend or not recommend the product to a friend?"&lt;/li&gt;&lt;/ul&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Minimize the planting of ideas&lt;/b&gt;: Usability tests now seem to involve more than just the classic task-based studies. They often involve getting at concepts, feature acceptance and interest. While we want to know if users would use a product, feature or brand, using the product or company name could possibly plant the idea in their head. We might not know if it was something the user would have mentioned without our cue. &lt;br&gt;&lt;br&gt;For example, instead of asking: "Is Samsung one of the top three companies you consider when purchasing a laptop?" ask:&amp;nbsp; "What three companies come to mind when you considering purchasing a laptop?" Then count the number of times the participants mention Samsung without any seeding from the facilitator.&amp;nbsp; Bonus: Use &lt;a href="http://www.measuringusability.com/blog/ci-10things.php"&gt;confidence intervals&lt;/a&gt; around the percent of times a company is mentioned by each participant to estimate the prevalence in the entire user population.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Minimize Yes and No Questions&lt;/b&gt;:&amp;nbsp; When probing users on their intentions and actions, we ask a lot of questions. Try to phrase questions so they can't easily be answered with short and often uninformative yes and no responses.&amp;nbsp; Yes and No questions of course sneak in all the time, so don't worry if you find yourself asking them. (There's a lot to keep track of in a usability testing session.) When they do sneak in or it's awkward not to ask a Yes or No question, be ready to follow up with another question (which of course won't be "Why?).&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Don't rely too heavily on the "Would you?" questions:&lt;/b&gt;&amp;nbsp; Usually, we really want to know if a feature is something that has to be built, fixed or nixed. The natural question is to ask is if users would use the feature (a yes/no question about the future). There's nothing inherently wrong with asking people if they'd do something; it's just that people are notoriously bad at predicting their future behavior. &lt;br&gt;&lt;br&gt;What's more, when participants are being paid for their opinion, there is a natural resistance to overtly acknowledging that something is useless, broken or not helpful to them.&amp;nbsp; Consequently, you're likely to receive more positively &lt;a href="http://www.measuringusability.com/blog/ut-bias.php"&gt;biased responses&lt;/a&gt;.&amp;nbsp;&amp;nbsp; Consider phrasing questions in a way that gives the user permission to be critical. Ask them questions like: "If you could only keep one feature?", or "If you had to cut two things, what would they be?"&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Turn user questions around&lt;/b&gt;: Learn the art of gently deflecting questions back on the user. Users will often ask a question about a feature or navigation element. Or, they will ask the ever popular "Did I do that right?" Instead of answering directly with a Yes or No, or ignoring the participant entirely, ask them "Do you think it was the correct selection?" or "What about that makes you unsure you've selected the correct option?" By turning the questions gently back toward the user, you can usually glean a bit more about those elusive intentions, mental models and motivations. &lt;br&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br&gt;You don't need to be a trained psychotherapist to conduct effective usability sessions, but it always helps to refine the art of understanding human behavior and intentions when looking to improve the customer experience. &lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/FGQZpUcr1G4" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 14 May 2013 22:30:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/blog/moderating-tips.php</feedburner:origLink></item>
				<item>
				<title>Seven Ways to Test the Effectiveness of Icons</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/AimRyzDZYYM/icon-tests.php</link>
				<description>&lt;a href="http://www.flickr.com/photos/iloveui/8693725467/sizes/n/in/photostream/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/icons.jpg" title="Icons" border="0"&gt;&lt;/a&gt;For as long as &lt;a href="http://psd.tutsplus.com/articles/theory/know-your-icons-part-1-a-brief-history-of-computer-icons/"&gt;user interfaces have had icons&lt;/a&gt;, there have been strong opinions about what makes an effective icon. &lt;br&gt;&lt;br&gt;From the business analyst to the CEO, we all like to tell the designer what's "intuitive" and what's "terrible."&amp;nbsp; &lt;br&gt;&lt;br&gt;Instead of making decisions based on the pay grade of the people in a meeting, consider using some data driven approaches to make better decisions. &lt;br&gt;&lt;br&gt;While internal acceptance, branding and style are all important considerations, the true arbiter of success is how well an icon conveys its meaning.&lt;br&gt;&lt;br&gt;Sure, labels on icons do an amazing job of complementing an image. But with interfaces used in many countries, icons often need to be designed for multiple languages and fit into cramped interfaces, making labels a less viable option.&lt;br&gt;&lt;br&gt;In our experience, there isn't a single-one-shot test for determining if an icon should stay or go. Instead, a multi-test approach tends to identify strengths and weaknesses to meet most needs.&lt;br&gt;&lt;br&gt;Icons can be easily tested in an &lt;a href="http://www.measuringusability.com/blog/method-comparison.php"&gt;unmoderated or moderated usability test&lt;/a&gt; with both small and large sample sizes. Here are seven ways to help determine if your icons are making the interface more efficient or simply adding clutter.&lt;br&gt;&lt;br&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;Association:&lt;/b&gt; Provide an icon and present participants with the intended function along with three incorrect options displayed in randomized order. Count the number of correct selections and use confidence intervals to determine the percentage of all users that would likely make the intended association.&amp;nbsp; &lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Reverse Association:&lt;/b&gt; Provide an icon definition and present participants with four possible images to associate with in randomized order.&amp;nbsp; Use &lt;a href="http://www.measuringusability.com/blog/ci-10things.php"&gt;confidence intervals&lt;/a&gt; again to estimate the effectiveness for the entire user population. For example, if 30 out of 39 users correctly selected the right association, you can be 90% confident between 65% and 86% of all users will also make the association.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Recall:&lt;/b&gt; Display an icon for a few moments and then ask participants to write the words or functions they remember about the icon. Summarize the results in word clouds and count up the number of times key words were listed. &lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Free Response&lt;/b&gt;: Show participants an icon and ask them to list words or phrases they associate with the icon. For example, in one test we wanted to know whether users would associate an icon with a parent/child relationship (not what we intended) or with primary/secondary assignments (what we intended).&amp;nbsp; &lt;img src="http://www.measuringusability.com/images/icon-child.jpg"&gt;&lt;br&gt;&lt;br&gt;In counting up the response from 61 participants, the word "child" or "parent" was used 29 times. The 90% confidence interval shows solid evidence that at least 37% of all users would make the incorrect association. This icon was rejected for an alternative.&amp;nbsp;&amp;nbsp; &lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Recounting&lt;/b&gt; : Have participants describe the icon as if they were explaining it to a friend. For years I've explained to my grandmother I talk with customers to measure the usability of websites and software in a lab or over the phone. She tells her friends that I'm in telemarketing.&amp;nbsp; Hearing how users would distill your icon's meaning into more universal terms helps you understand the message being delivered (or not delivered). &lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;In Context vs. Out of Context&lt;/b&gt;: Icon meanings are often only understood in the context of other icons or in the application that's being tested. In the context of an HR application, an icon with a dollar sign can mean something different than the same dollar sign in an accounting application. In general, we've found that just having a small screen-shot with the icon in the context of how it will be seen provides a slightly more reliable result than an icon in isolation. Of course, testing an icon out of context also has value when you want to know how well the correct association will be made,on marketing materials or in a suite of applications where the context changes or is unknown.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Time to Locate&lt;/b&gt;: Ask participants to find a function or complete an action using the icons as presented in the context of an application. This is more like a classic &lt;a href="http://www.measuringusability.com/blog/measure-findability.php"&gt;findability study&lt;/a&gt; or click test. Time how long it takes users to successfully click on the correct icon as a measure of success. &lt;/li&gt;&lt;/ol&gt;&lt;br&gt;With enough testing and refinement, perhaps your icons will go on to be one of the &lt;a href="http://www.hanselman.com/blog/TheFloppyDiskMeansSaveAnd14OtherOldPeopleIconsThatDontMakeSenseAnymore.aspx"&gt;famous outdated visualizations&lt;/a&gt; like the disk drive or Rolodex that we can't shake in our interfaces.&lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/AimRyzDZYYM" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 07 May 2013 22:45:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/blog/icon-tests.php</feedburner:origLink></item>
				<item>
				<title>Five User Research Mistakes to Avoid</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/Ibx5oM4_t7U/research-mistakes.php</link>
				<description>&lt;a href="http://www.flickr.com/photos/henriquev/5415139520/sizes/n/in/photostream/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/caution-tape.jpg" title="Caution: Avoid these mistakes" border="0"&gt;&lt;/a&gt;There are a lot of mistakes that can be made when conducting any type of research. &lt;br&gt;&lt;br&gt;But almost all research contains some mistakes in methodology, measurement or   interpretation. &lt;br&gt;&lt;br&gt;Rarely do the mistakes render the research   useless.&amp;nbsp; &lt;br&gt;&lt;br&gt;To help make your next user research endeavor more useful, here are five common mistakes to avoid. &lt;br&gt;&lt;br&gt;&lt;b&gt;1. Usability tests that are actually feature reviews&lt;/b&gt;: If you ask users to spend some time exploring an application or website and give their opinion, they will be happy to oblige. While this type of feedback is better than nothing, it's not a usability test. &lt;br&gt;&lt;br&gt;In the precious time you schedule with users, don't treat it like a design review. Instead, have them use the application like they would if no one was around.&amp;nbsp; While a few early adopters will relish exploring features and screens, most users have just a few things they want to accomplish in the software they download, the app they install or the website they visit. Be sure you have users attempt to accomplish a task, not just poke around on some screens.&lt;br&gt;&lt;br&gt;&lt;b&gt;2. Failing to have task success criteria and collect metrics:&lt;/b&gt;&amp;nbsp; Even if you &lt;a href="http://www.measuringusability.com/blog/actual-users.php"&gt;test with only five&lt;/a&gt; users on an early stage prototype, you should do at least three things. &lt;br&gt;&lt;ol&gt;&lt;li&gt;Have tasks with clearly defined successful criteria: Don't just have users explore a feature. Have users attempt realistic tasks and define what's an acceptable outcome.&lt;/li&gt;&lt;li&gt;Count the frequency of &lt;a href="http://www.measuringusability.com/blog/usability-problems.php"&gt;usability problems&lt;/a&gt;: You should document the problems in the interface you observe and also record which users &lt;a href="http://www.measuringusability.com/blog/problem-matrix.php"&gt;encountered which problem&lt;/a&gt;  instead of simply reporting that the problem was observed.&amp;nbsp; We've consistently been deceived by our memories when reviewing videos of sessions between usability tests. Some problems seemed to happen to all users, yet when we count up all the occurrences, we'll see that 3 out of 8 users actually had the problem. &lt;br&gt;&lt;/li&gt;&lt;li&gt;Compute a &lt;a href="http://www.measuringusability.com/blog/completion-rates.php"&gt;completion rate&lt;/a&gt; : If 1 out of 5 users complete a task, report the 20% completion rate along with a 90% &lt;a href="http://www.measuringusability.com/blog/ci-10things.php"&gt;confidence interval&amp;nbsp;&lt;/a&gt; of 3% to 59%. The confidence interval helps change the conversation from "your sample size is too small" to knowing that it's very likely that a majority of all users will not be able to complete the task and it's worth fixing now.&lt;br&gt;&lt;/li&gt;&lt;/ol&gt;3. &lt;b&gt;The denominator problem&lt;/b&gt;:&amp;nbsp; It's easy to get obsessed over conversion rates when running &lt;a href="https://www.measuringusability.com/blog/ab-testing.php"&gt;A/B testing&lt;/a&gt; or optimizing a website for increased sales. Sure, conversion rate is a great metric but it's not the only metric that can be used. If one design element generates a higher conversion rate but lowers the total number of sales or reduces the average sales price, have you really optimized the site?&amp;nbsp; &lt;br&gt;&lt;br&gt;Additionally, huge spikes from promotions, media attention or seasonality that drive large traffic spikes will often reduce the conversion rate (larger denominator) but increase sales. This is a good thing. Consider combining conversion rates and revenue into one metric, or look at both when determining which treatment is really the best.&lt;br&gt;&lt;br&gt;&lt;b&gt;4. Not having a comparison&lt;/b&gt;: One of the best ways to provide meaning to metrics is answering the question "&lt;a href="http://www.measuringusability.com/blog/compared-what.php"&gt;Compared to what?&lt;/a&gt;" If 46% of users can find a sewing machine on a department store website, it sounds like horrible &lt;a href="http://www.measuringusability.com/blog/measure-findability.php"&gt;findability&lt;/a&gt;. But if only 10% could find the same sewing machine prior to the redesign, this is a findability improvement. If it takes users two minutes to find the nearest rental location on Budget.com, is that too long?&amp;nbsp; Perhaps, but not &lt;a href="http://www.youtube.com/watch?v=WurwNauy8Xs"&gt;when compared to Enterprise.com&lt;/a&gt;, which took 200 seconds (67% longer). Usually the most meaningful comparison is comparing a product to itself over time or perhaps to similar products within the same company. So don't stress too much over not having access to competitive data. &lt;br&gt;&lt;br&gt;&lt;b&gt;5. Obsessing over demographics&lt;/b&gt;: When it comes to usability testing, we've consistently found that the biggest differentiator in usability metrics is not demographics differences, but whether users have &lt;a href="http://www.measuringusability.com/blog/prior-exposure.php"&gt;prior experience&lt;/a&gt; or are more knowledgeable about a domain or industry. This is especially the case if it's a specialized domain or a product requiring special skills, such as accounting or ERP software. Gender, age, geography and income often get center stage when discussing recruiting and when presenting the results. It's understandable. People want to know that the research is based on the right people. &lt;br&gt;&lt;br&gt;But when it comes to the type of behavior we see in usability testing, actions tend to cross classes of people. If you are designing snow shoes, you don't want to test surfers, but if you are a researcher in Hawaii, you'll still be able to tell if the shoes won't fit an average person. Be concerned with demographics, but not obsessed.&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/Ibx5oM4_t7U" height="1" width="1"/&gt;</description>
				<pubDate>Wed, 01 May 2013 23:20:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/blog/research-mistakes.php</feedburner:origLink></item>
				<item>
				<title>The 3 R’s of Measuring Design Comprehension</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/xvOlnaSgYbA/measuring-comprehension.php</link>
				<description>&lt;a href="http://www.flickr.com/photos/frozenchipmunk/250236754/sizes/m/in/photostream/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/chalkboard.jpg" title="The 3 R's" border="0" height="196" width="282"&gt;&lt;/a&gt;Will users get it?&lt;br&gt;&lt;br&gt;Marketing and design teams often want to know if users will understand a key concept on a website or design. &lt;br&gt;&lt;br&gt;For example, do users understand new terms and conditions, a privacy policy, different product models, prices or the service packages properly?&lt;br&gt;&lt;br&gt;When you want to know if users will understand something in a design, you can quickly see how asking "Did you understand the difference in our service plans?" isn't a good idea. &lt;br&gt;&lt;br&gt;It's unlikely that more than a few participants will acknowledge that they don't understand something. It would be like asking students in a math class if they understand the &lt;a href="http://www.measuringusability.com/blog/quant-tips.php"&gt;concept of logarithms&lt;/a&gt;.&amp;nbsp; It's much better to ask students to find the logarithm of 1000.&lt;br&gt;&amp;nbsp;&lt;br&gt;To that end, to really measure comprehension, you need to use questions that assess users' knowledge of the policy, the product or design. &lt;br&gt;&lt;br&gt;We use three complimentary techniques: measuring &lt;b&gt;recognition, &lt;/b&gt;&lt;b&gt;&lt;b&gt;recall &lt;/b&gt;and recounting&lt;/b&gt;. &lt;br&gt;&lt;h2&gt;Recognition&lt;/h2&gt;Recognition measures the ability of a user to correctly identify an item among a set of alternatives. This is measured using the classic multiple choice question format. When a user can correctly select an item from a list of multiple choice questions, it demonstrates at least a superficial level of comprehension. &lt;br&gt;&lt;br&gt;For example, if you want to know whether users understand a new cancellation policy for a service, you can ask them to review a product page and then answer some questions that include something like the following:&lt;br&gt;&lt;br&gt;Which of the following options best represents the service cancellation policy?&lt;br&gt;&lt;blockquote&gt;A.&amp;nbsp;&amp;nbsp; &amp;nbsp;All sales are final.&lt;br&gt;B.&amp;nbsp;&amp;nbsp; &amp;nbsp;You can cancel any time after 90 days.&lt;br&gt;C.&amp;nbsp;&amp;nbsp; &amp;nbsp;You can cancel at any time.&lt;br&gt;D.&amp;nbsp;&amp;nbsp; &amp;nbsp;You can cancel any time within the first 30 days. &lt;br&gt;&lt;/blockquote&gt;&lt;br&gt;If a user selects the correct choice "C", this reflects a certain level of comprehension. But it's unclear if the participant would have provided this answer without considering the alternatives and getting a quick mental cue from seeing the correct answer. Guessing also complicates matters. With one correct choice and three incorrect alternatives, there is a 25% chance of randomly selecting an item correctly.&amp;nbsp; &lt;br&gt;&amp;nbsp;&lt;br&gt;Adding additional multiple choice questions about the cancellation policy can certainly help this, and that's why standardized tests aren't merely single questions. The probability of correctly guessing &lt;i&gt;three &lt;/i&gt;questions correcting is .25&lt;sup&gt;3&lt;/sup&gt; or&amp;nbsp; about 2% . &lt;br&gt;&lt;br&gt;For &lt;a href="http://www.measuringusability.com/blog/unmoderated-things.php"&gt;unmoderated usability testing&lt;/a&gt;, we often need to verify task completion rates and use a multiple choice verification question. It usually asks participants to provide the price or description of a product we asked them to search for. That means in many respects, the lower limit of task completion rates would be closer to 25% than 0% (for multiple choice questions with four options).&amp;nbsp; &lt;br&gt;&lt;br&gt;So while adding many multiple choice questions offsets the problems of guessing and poorly worded questions and answers, we can't subject participants to long batteries of SAT-like questions in the world of applied user research. We use complimentary approaches instead. &lt;h2&gt;Recall&lt;/h2&gt;Recall is the users' ability to pull the correct answer from their memory without any prompting or cues. This is usually measured by having participants answer open-ended questions.&amp;nbsp; &lt;br&gt;&lt;br&gt;Open-ended questions that require users to correctly recall the cancellation policy terms or the name of a feature provides evidence for a deeper level of comprehension than recognition.&amp;nbsp; Asking a participant to recall the cancellation policy would entail a question such as: &amp;nbsp;&lt;br&gt;What is the cancellation policy for the software service?&lt;br&gt;&lt;br&gt;While we largely eliminate the problem of guessing, open-ended questions have their own issues. They take longer to analyze and introduce an additional layer of subjectivity and differing interpretation.&amp;nbsp;&amp;nbsp; &lt;br&gt;&lt;h2&gt;Recounting&lt;/h2&gt;Sometimes we not only want to understand whether users understand specific aspects of a product or service, but we want to know what features or details are most important and memorable in the mind of the user. &lt;br&gt;&lt;br&gt;Instead of asking a participant to summarize what they understand, we ask them how they would explain what they saw to a friend or colleague. For example, "How would you explain the service and cancellation policy to a friend who was considering this service?" By asking a participant to rephrase things to a non-present friend forces them to not rely on jargon or half-baked terms.&amp;nbsp; &lt;br&gt;&lt;br&gt;This approach helps not only to assess a deeper level of comprehension but also to assess what features stand out, and in the users' language. The verbatim responses provide a great opportunity to determine what branded terms are being used.&amp;nbsp; &lt;br&gt;&lt;br&gt;Rarely can we assess whether users "get" a concept, feature or detail with a single question or by asking them directly. Using a mix of multiple choice questions (recall), open-response questions (recognition) and recounting questions provides a balanced view of what users understand and what they don't comprehend. We find this approach works well for measuring abstract software concepts, terms and conditions, pricing structures, upgrades, product tiers, service plans, and branded features. &lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/xvOlnaSgYbA" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 23 Apr 2013 22:45:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/blog/measuring-comprehension.php</feedburner:origLink></item>
				<item>
				<title>Seven Tips for Writing Usability Task Scenarios</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/mrQr_lvMmZI/task-tips.php</link>
				<description>&lt;a href="http://www.flickr.com/photos/otherjoel/1363797460/sizes/n/in/photostream/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/task-list.jpg" title="The Art of the Task" border="0" height="206" width="270"&gt;&lt;/a&gt;The core idea behind usability testing is having real people trying to accomplish real tasks on software, websites, cell phones or hardware.&amp;nbsp; &lt;br&gt;&lt;br&gt;Identifying what users are trying to do &lt;a href="http://www.measuringusability.com/blog/five-redesign.php"&gt;is a key first step&lt;/a&gt;. Once you know what tasks you want to test, you'll want to create realistic task scenarios for participants to attempt.&lt;br&gt;&lt;br&gt;A task is made up of the steps a user has to perform to accomplish a goal. A task-scenario describes what the test user is trying to achieve by providing some context and the necessary details to accomplish the goal. &lt;br&gt;&lt;br&gt;Crafting task scenarios is a balance between providing just enough   information so users aren't guessing as to what they're supposed to do   and not too much information so you can simulate the discovery and   nonlinearity of real world application usage.&lt;br&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;Be specific&lt;/b&gt;: Give participants a reason or purpose for performing the task. Instead of giving generalities like "find a new kitchen appliance" ask them to find a blender for under $75 that has high customer ratings.&lt;br&gt;&lt;br&gt;While users might start searching with general ideas of what they want, they will quickly narrow their selection based on the usual suspects of price, indicators of quality and recommendations.&amp;nbsp; In the artificial world of usability testing, users will often encounter problems if you are too vague, and they will look to a moderator (if there is one) as to what they want them to find.&amp;nbsp; Don't be so vague in your task that users have to guess what you want them to do. For example, "You need to rent a mid-sized car on July 21st at 10am and return it on July 23rd at noon from Boston's Logan Airport."&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Don't tell the user where to click and what to do&lt;/b&gt;: While providing specific details is important, don't walk the users through every step. Leading a user too much will provide biased and less useful results. For example, instead of saying "Click on the small check box at the bottom of the screen to add GPS," just say "Add GPS to your rental car."&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Use the user's language and not the company's language&lt;/b&gt;: It's a common mistake to mirror the internal structure of a company on a website's navigation. It's also bad practice to ask participants to do things based on internal company jargon or terms. If users don't use the terms used in a scenario, it can lead to false positive test results or outright confusion.&amp;nbsp; Do users really use the term "asset" when referring to their kids' college funds?&amp;nbsp; Will a user know what a product "configurator" is or an "item-page" or even the "mega menu?"&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Have a correct solution:&lt;/b&gt; If you ask a user to find a rental car location nearest to a hotel address, there should be a correct choice. This makes the task more straightforward for the user and allows you to more easily know if a task was or wasn't successfully completed. The problem with "Find a product that's right for you" task is that participants are in the state of mind of finding information to solve problems. At the time, there probably isn't a product that's right for them; they're more interested in getting the test done and collecting their honorarium. This can lead to a sense that any product selection is correct and inflate basic metrics like &lt;a href="http://www.measuringusability.com/blog/completion-rates.php"&gt;task completion rates&lt;/a&gt;.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Don't make the tasks dependent (if possible):&lt;/b&gt; It is important to &lt;a href="http://www.measuringusability.com/random.htm"&gt;alternate the presentation order&lt;/a&gt; of tasks as there is a significant learning effect that happens. If your tasks have dependencies (e.g., create a file in one task then delete the same file in another task) then if a users fails one task they will often necessarily fail the other. Do your best to avoid dependencies (e.g. have the user delete another file.)&amp;nbsp; This isn't always possible if you're testing an installation process but be cognizant of both &lt;a href="http://www.measuringusability.com/blog/ut-bias.php"&gt;the bias&lt;/a&gt; and complications introduced by adding dependencies. &lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Provide context but keep the scenario short&lt;/b&gt;: You want to provide some context to get the user thinking as if they were actually needing to perform the task. But don't go overboard with the details. For example, "You will be attending a conference in Boston in July and need to rent a car." &lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;Task scenarios differ for moderated and unmoderated testing&lt;/b&gt;: The art of task-scenario writing has been &lt;a href="http://www.measuringusability.com/blog/usability-history.php"&gt;honed over the years&lt;/a&gt; largely through moderated lab-based testing. However, if you're conducting an unmoderated usability test, it requires an additional level of refinement. You can't rely on a moderator to encourage users through a task and ask them what they'd expect.&lt;br&gt;&lt;br&gt;While you don't want to lead users and give them step-by-step instructions, you do need to be more explicit. You'll need to provide product names, specific price ranges and brands. While some people might be concerned that will lead the user, I rarely see a task-completion rate above 90% in unmoderated benchmark studies.&amp;nbsp; Even with all these details spelled out, users get lost in the navigation, the checkout procedures, or confused by simple things like terms, and overall organizations that aren't obvious to developers so close to a design. &lt;br&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br&gt;It takes some practice balancing not leading users on one hand and not making the task too difficult on the other.&amp;nbsp; There are no universally "right" tasks, so don't be afraid to tweak details for different methods (moderated vs. unmoderated) or different goals (findablity&amp;nbsp; vs. checkout). It's even fine to read task scenarios out loud instead of having them printed or on the screen (we do this a lot with mobile testing).&amp;nbsp; &lt;br&gt;&lt;br&gt;For more information on writing better usability task scenarios, one of the best sources is one of the &lt;a href="http://www.measuringusability.com/blog/usability-books.php"&gt;classics&lt;/a&gt;: &lt;a href="www.amazon.com/gp/product/18415a0208?ie=UTF8&amp;amp;tag=meausallc-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1841500208"&gt;A Practical Guide to Usability Testing from 1993 by Dumas and Redish&lt;/a&gt;, &lt;a href="http://www.amazon.com/gp/product/1453806563/ref=as_li_ss_tl?ie=UTF8&amp;amp;tag=meausallc-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=1453806563"&gt;A Practical Guide to Measuring Usability&lt;/a&gt; and &lt;a href="http://www.amazon.com/gp/product/0123748925?ie=UTF8&amp;amp;tag=meausallc-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0123748925"&gt;Beyond the Usability Lab&lt;/a&gt; for unmoderated studies.&lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/mrQr_lvMmZI" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 16 Apr 2013 23:59:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/blog/task-tips.php</feedburner:origLink></item>
				<item>
				<title>How to Measure Learnability</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/ld-ePLGG6Wc/measure-learnability.php</link>
				<description>&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/ski-learnability.jpg" title="Learning to Ski" border="0"&gt;Learnability is often used interchangeably with usability. &lt;br&gt;&lt;br&gt;While they are similar concepts, learnability is actually something a bit different. &lt;br&gt;&lt;br&gt;Part of the confusion is that there are two common uses of the term learnability. &lt;br&gt;&lt;br&gt;The first use of learnability describes the ability of an interface to allow users to accomplish tasks on the first attempt.&amp;nbsp; &lt;br&gt;&lt;br&gt;We often refer to this as usability for first time use.&amp;nbsp; Nielsen &lt;a href="http://www.nngroup.com/articles/usability-101-introduction-to-usability/"&gt;also defines learnability&lt;/a&gt; as easy first time use&amp;nbsp; but lists learnability as a sub component of the construct of usability.&lt;br&gt;&lt;br&gt;Measuring usability under this definition is basically using our &lt;a href="http://www.measuringusability.com/blog/essential-metrics.php"&gt;classic usability metrics&lt;/a&gt; and measuring task performance for users who have never been exposed to a system or at least have very little exposure to the tasks and interface, even if they've used it before.&amp;nbsp; Most usability testing falls into this categorization.&lt;br&gt;&lt;br&gt;A second definition of learnability is usability over time. Basically, task performance, which is also measured using the &lt;a href="http://www.measuringusability.com/blog/essential-metrics.php"&gt;classic usability metrics&lt;/a&gt;, improves after repeated "trials." More practice results in less time needed to complete tasks. Typically, the improvement &lt;a href="http://en.wikipedia.org/wiki/Power_law_of_practice  "&gt;isn't linear, but logarithmic.&lt;/a&gt; &lt;br&gt;&lt;br&gt;A more learnable system is one that reduces the time it takes to complete tasks as users spend more time with a system faster than others. This can be especially important in instances when a certain amount of training is expected or required with an application. For example, enterprise accounting systems require some expectation of training to learn the organizational rules of bookkeeping. &lt;br&gt;&lt;br&gt;One criticism of usability testing is that it can be an unfair assessment of actual usage if users don't have a chance to get acquainted with the interface. This is especially understandable when specialized training is required.&amp;nbsp; In my experience, many applications and websites fall somewhere between the extremes of walk-up-and-use museum kiosks and highly specialized manufacturing order entry systems. Collecting usability metrics over multiple trials helps settle disputes about usability and provides data on first time use and use with practice.&lt;br&gt;&amp;nbsp;&lt;br&gt;&lt;h2&gt;Learnability of Expense Reporting Applications&lt;/h2&gt;For example, a few years ago, we were testing two expense reporting web applications. If you work at a large company or a consultancy that tracks expenses and hours, then you probably have familiarity with expense reporting systems. While the basic process of submitting expenses to get reimbursed is a walk-up-and-use application, many companies have specific rules and idiosyncrasies that require some getting used to. &amp;nbsp;&lt;br&gt;&lt;br&gt;The two expense reporting systems we tested supported the same functionality but had rather different interfaces. It was expected that most employees using the system would have some introduction to it as well as a few discussions with a manager about where and how to submit expenses.&amp;nbsp; We wanted to know: Given a brief introduction to the systems, which one would be more usable learnable? That is, after repeated use (trials), which application would enable users to be more efficient?&lt;br&gt;&lt;br&gt;To test the systems, we had 26 users who submitted expense reports in various applications attempt the same set of five core expense reporting tasks on both systems. We provided the users with a short video introduction on how to submit the expense reports using both systems prior to their initial attempt on each system.&amp;nbsp; They only received this training once. &lt;br&gt;&lt;br&gt;Each user repeated the five tasks three times. The application and task order were both counterbalanced to minimize sequence effects. The tasks including submitting an expense report, updating a report, and verifying expenses were paid.&amp;nbsp; &lt;br&gt;&lt;br&gt;At the end of the tasks, we also administered the System Usability Scale (SUS) for each application in order to get an overall sense of perceived ease of use. &lt;br&gt;&lt;h2&gt;Results&lt;/h2&gt;The graphs below show the mean time to complete the tasks for three of the five tasks attempted. The mean time is on the vertical (y axis) and each of the three attempts (called a trial) is on the x-axis. &lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp; &amp;nbsp; &lt;br&gt;&lt;table border="0" cellpadding="2" cellspacing="2" width="300"&gt;  &lt;tbody&gt;&lt;tr&gt;  &lt;td&gt;&amp;nbsp;&lt;img src="http://www.measuringusability.com/images/learn-task1.jpg" height="127" width="210"&gt;&lt;/td&gt;  &lt;td&gt;&amp;nbsp;&lt;img src="http://www.measuringusability.com/images/learn-task2.jpg" height="126" width="188"&gt;&lt;/td&gt;  &lt;td&gt;&amp;nbsp;&lt;img src="http://www.measuringusability.com/images/learn-task3.jpg" height="124" width="206"&gt;&lt;/td&gt;  &lt;/tr&gt;    &lt;/tbody&gt;&lt;/table&gt;  &lt;font size="2"&gt;&lt;b&gt;Figure 1&lt;/b&gt;: Mean time to complete tasks as a function of trial for three of five tasks in two comparable expense reporting systems. Product O had generally faster performance.&lt;/font&gt;&lt;br&gt;&lt;br&gt;You can see in all three tasks the downward slope of the lines. This indicates that users are performing the tasks faster as they get more practice.&amp;nbsp; You can also see where the term "learning curve" comes from.&amp;nbsp; When graphed this way, a steeper learning curve represents faster learnability, contrary to the more common use of the term indicating a harder to learn task.&lt;br&gt;&lt;br&gt;You can also see that users can generally complete tasks faster on Product O (the blue lines).&amp;nbsp; The faster performance was consistent with the perceptions of usability. The SUS score for Product O was 82, and for Product P the score was 53 (See Page 65 in &lt;a href="http://www.amazon.com/gp/product/0123849683/ref=as_li_ss_tl?ie=UTF8&amp;amp;tag=meausallc-20&amp;amp;linkCode=as2&amp;amp;camp=1789&amp;amp;creative=390957&amp;amp;creativeASIN=0123849683"&gt;Quantifying the User Experience&lt;/a&gt; for the raw scores).&amp;nbsp; &lt;br&gt;&lt;br&gt;We also found task-difficulty ratings closely mirroring the task-time. There wasn't much difference in the completion rates largely because we provided some clue as to how to complete the tasks (which isn't always the case and the subject for another blog.)&lt;br&gt;&lt;br&gt;One thing we looked for when measuring the repeated trials was whether users of the slower product would ever "catch-up" to users of the faster product.&amp;nbsp; We were looking for converging or crossing learning curves. &lt;br&gt;&lt;br&gt;The closest we came was on the second task. The graph below shows the same task as the middle graph above but with the axis rescaled to emphasize the change. &lt;br&gt;&lt;br&gt;&lt;img src="http://www.measuringusability.com/images/learn-task2-scaled.jpg" height="226" width="371"&gt;&lt;br&gt;&lt;font size="2"&gt;&lt;b&gt;Figure 2:&lt;/b&gt; Mean task time for task 2 by trial with a scaled y-axis. Product P has initial faster task time (usability) but falls behind on subsequent trials. At this sample size the differences in mean times are not statistically significant. &lt;/font&gt;&lt;br&gt;&lt;br&gt;Product P actually had a faster task completion time on the initial trial, although the difference wasn't statistically significant.&amp;nbsp; However, on the two subsequent trials, Product O showed better performance, although this is also not statistically significant. &lt;br&gt;&lt;br&gt;This study illustrates how to measure learnability in a lab based setting. The tasks took between one and three minutes to complete, and so we were limited with how many trials we could perform. The session length was already running between two and two and half hours so our learning curves were almost as short as they get (two being the absolute minimum number of trials). When testing &lt;a href="http://ir.canterbury.ac.nz/bitstream/10092/662/1/12602883_paper191-cockburn.pdf"&gt;reaction times to menus&lt;/a&gt;&lt;font color="#FF0000"&gt;[pdf]&lt;/font&gt; or other quick decision making tasks the learning curves become more pronounced. &lt;br&gt;&lt;br&gt;The study did provide us with sufficient data on first time use and repeated use, and it allowed us to see how much improvement in time, errors and perceived difficulty we could expect after a few months of usage. In most cases, the third trial had a statistically faster task-completion time that the first attempt. In a few cases there were dramatic reductions in the task time (often a 50% reduction). &lt;br&gt;&lt;br&gt;This allowed us to discuss the performance of initial use (the common usability test) as well as repeated use, and we were able to identify interface problems that led to persistent interaction problems even after users were familiar with the interface.&amp;nbsp; &lt;br&gt;&lt;br&gt;The next time you find yourself in a discussion &lt;a href="http://www.measuringusability.com/blog/ut-bias.php"&gt;about the biases&lt;/a&gt; of testing initial use as a measure of usability, consider including at least one repeated task to get an estimate of learnability. In the same study, you can collect data to understand how the application supports both initial use and usage over time. Both your novice and experienced users will learn to thank you for it.&lt;br&gt;&lt;br&gt;&amp;nbsp;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/ld-ePLGG6Wc" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 09 Apr 2013 22:30:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/blog/measure-learnability.php</feedburner:origLink></item>
				<item>
				<title>Using Tree-Testing To Test Information Architecture</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/G37VvXFKPuw/tree-testing-ia.php</link>
				<description>&lt;a href="http://www.flickr.com/photos/78428166@N00/8500704185/sizes/m/in/photostream/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/tree-image2.jpg" title="Tree Testing" border="0"&gt;&lt;/a&gt;Tree-testing is a lesser known UX method but can substantially help with improving problems in navigation. &lt;br&gt;&lt;br&gt;There are several software packages to allow you to conduct card sorting quickly and remotely, including solutions form &lt;a href="http://www.userzoom.com/?source=MeasuringUsability"&gt;UserZoom&lt;/a&gt; and &lt;a href="http://www.optimalworkshop.com/?utm_source=Measuring%2BUsability%2BBlog&amp;amp;utm_medium=banner&amp;amp;utm_campaign=products"&gt;OptimalWorkshop&lt;/a&gt;. &lt;br&gt;&lt;br&gt;Like the other popular method for testing IA, &lt;a href="https://www.measuringusability.com/blog/card-sorting-ia.php"&gt;Card Sorting&lt;/a&gt;, we'll cover tree-testing at the &lt;a href="http://www.denverux.com/"&gt;Denver UX boot camp&lt;/a&gt;. Here are several questions to get you thinking about using the method that I covered during a recent webinar.&lt;br&gt;&lt;br&gt;&lt;h2&gt;When would you use a tree test?&lt;/h2&gt;Tree testing is sometimes referred to as reverse card sorting since you are finding items instead of placing them into a navigation structure (often called taxonomy).&amp;nbsp; A tree test is like a usability test on the skeleton of your navigation with the design "skin" removed. It allows you to isolate problems in findability in your taxonomy, groups or labels that are not attributable to issues with design distractions, or helpers. &lt;br&gt;&lt;br&gt;Tree tests also remove search from the equation as a substantial &lt;a href="http://www.measuringusability.com/blog/search-browse.php"&gt;portion of users will use search&lt;/a&gt; while looking for information on a website. While a great search engine and search results page are essential for helping findability, so is navigation. You'll want to isolate the causes of navigation problems and improve them so that when users browse, they find what they're looking for.&lt;br&gt;Tree tests are ideally run to:&lt;br&gt;&lt;ol&gt;&lt;li&gt;Set a baseline "findability" measure before changing the navigation. This will reveal what items, groups or labels could use improvement (and possibly a new card sort).&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;Validate a change: Once you've made a change, or if you are considering a change in your IA, run the tree test again with the same (or largely the same) items you used in the baseline study. This helps tell you quantitatively if you've improved findability, kept it the same, or just introduced new problems.&lt;/li&gt;&lt;/ol&gt;&lt;p&gt;Finally, we have found that tree testing, while similar to card sorting, does generate different findings. For example, we found that difficulty sorting an item only explained 16% of difficulty finding the item--an overlap but not redundancy.&lt;br&gt;&lt;/p&gt;&lt;h2&gt;How do you select which items to test in a tree test? &lt;/h2&gt;If you have a manageable navigation with a few dozen to a hundred items, you can include all of them in the study. For large websites with thousands of items, this can get unwieldy fast. You may find that paring it down to a few hundred items is sufficient if you eliminate some less used paths for the testing.&lt;br&gt;&lt;br&gt;When it comes to selecting items for testing in the structure, we like to work with items that either cross departments, come from a &lt;a href="http://www.measuringusability.com/blog/five-redesign.php"&gt;top-task study,&lt;/a&gt; or are items that had problems in an &lt;a href="https://www.measuringusability.com/blog/card-sorting-ia.php"&gt;open card sort. &lt;/a&gt;&lt;br&gt;&lt;h2&gt;How many participants do you suggest for a tree test? &lt;/h2&gt;The sample size question initially comes down to the outcome metric. Because a tree test is basically a mini-usability test, we can use the same metrics in a usability test along with the same procedure to identify sample sizes. In general, the key metric will be whether the user successfully located an item, which is a binary measure like task completion ("found/didn't find" coded as 1 and 0 respectively).&amp;nbsp; &lt;br&gt;&lt;br&gt;The table below shows the sample size you will need to achieve 95% confidence around the findability rates. For example, at a sample size of 93, if 50% of the users locate an item, you'll be 95% confident that between 40% and 60% of all users would find the item given the same tree test. You would need to quadruple your sample size (381) to cut your margin of error in half (5%).&lt;br&gt;&lt;br&gt;&lt;table class="articletable" border="0" cellpadding="0" cellspacing="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;&lt;strong&gt;Sample Size&lt;/strong&gt;&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;&lt;strong&gt;Margin of Error (+/-)&lt;/strong&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;10&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;27%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;21&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;20%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;30&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;17%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;39&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;15%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;53&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;13%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;93&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;10%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;115&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;9%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;147&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;8%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;193&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;7%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;263&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;6%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;381&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;5%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;597&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;4%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;1064&lt;/td&gt;&lt;td style="text-align: center;" valign="top" width="108"&gt;3%&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;font size="2"&gt;&lt;b&gt;Table 1&lt;/b&gt;: Sample size for proportions used &lt;font size="2"&gt;to assess findability in &lt;font size="2"&gt;tree testing &lt;/font&gt;&lt;/font&gt;(95% confidence and assumes percentages of 50%).&lt;br&gt;&lt;/font&gt;&lt;br&gt;&lt;h3&gt;How many tree test tasks should each participant be asked to perform?&lt;/h3&gt;Again, this depends on the complexity of the navigation and difficulty of the items. We typically see around 1-2 minutes per item. We ran a tree test with 14 items which took a median time of 17 minutes, and another study with 30 items it took 53 minutes. This also includes the time for users to answer two post item questions (confidence and difficulty).&lt;br&gt;&lt;br&gt;&lt;h2&gt;Do you have any strategies for incorporating follow up survey questions with tree tests? How do these help to supplement the tree test results?&lt;/h2&gt;We ask participants the &lt;a href="http://www.measuringusability.com/blog/seq10.php"&gt;Single Ease Question &lt;/a&gt;(SEQ), which is a standardized measure to assess task difficulty. Because so much of task usability is simply finding the item, we find the percentile ranks offer a good guide as to the usability. An average score is fluctuates between a 4.8 and 5.1 across hundreds of tasks. &lt;br&gt;&lt;br&gt;We also ask how confident users were and then associate confidence and completion to generate item "disasters." The graph below shows the four-block for confidence and completion (correctness).&amp;nbsp; You want as many items in the upper-right as possible.&lt;br&gt;&lt;br&gt;&lt;img src="http://www.measuringusability.com/images/tree-confidence.jpg"&gt;&amp;nbsp;&lt;br&gt;&lt;font size="2"&gt;&lt;b&gt;Figure 1:&lt;/b&gt; Crossing confidence with success rate (correct) provides and additional perspective on items users might think they are finding correctly but are not (called disasters).&lt;/font&gt;&lt;br&gt;&lt;h2&gt;How do task success rates on a tree test compare with success rates on a live site?&lt;/h2&gt;This is a great open-ended research question that we are currently exploring.&amp;nbsp; We've examined task-completion rates across dozens of usability studies and found &lt;a href="http://www.measuringusability.com/blog/task-completion.php"&gt;the average completion rate is around 78%&lt;/a&gt;. We expect the tree test average to be lower than this for at least two reasons.&lt;br&gt;&lt;ol&gt;&lt;li&gt;There is no search to help users find the items.&lt;/li&gt;&lt;li&gt;There are no design elements to help guide users or emphasize more popular choices and increase the "information scent."&lt;/li&gt;&lt;/ol&gt;When examining a much smaller sample of just 77 tree test tasks from 200 users&amp;nbsp; across three studies, we found the average completion rate was 66%. Consider this result tentative as we continue to collect more data.&lt;br&gt;&lt;br&gt;However, we (along with Nate Colker from UserZoom) have started running some experiments in which we randomly assign users to find an item in a tree or the live website.&amp;nbsp; Preliminary results from two websites and tree tests (Target and Ikea) show an opposite pattern of what we were expecting. Of the 20 tasks, 17 had higher task completion rates on the tree test! This would suggest that the design elements and possible poor search results may actually hurt the findability more than help it.&amp;nbsp; &lt;br&gt;&lt;br&gt;Or, there could be a methodological difference in how we assess a successfully found item. Users are finding the right item, but we might not be giving them credit for the correct URL due to possible variations we haven't accounted for. More data is needed to confirm these findings. In the interim, it's always good practice to use the same method (tree test or live site) when you run the test again after making changes.&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/G37VvXFKPuw" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 26 Mar 2013 22:00:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/blog/tree-testing-ia.php</feedburner:origLink></item>
				<item>
				<title>The Essential Elements of a Successful Website</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/qDRT3lq5gD8/suprq.php</link>
				<description>&lt;a href="http://www.suprq.com/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/website-logos.jpg" title="Usability, Credibility, Appearance and Trust are the essential elements" border="0"&gt;&lt;/a&gt;What makes a successful website?&lt;br&gt;&lt;br&gt;There are some obvious metrics like revenue, traffic and repeat visitors. &lt;br&gt;&lt;br&gt;But these are outcome measures. They don't tell you why revenue or traffic is higher or lower.&amp;nbsp; &lt;br&gt;&lt;br&gt;Key drivers of these outcomes are how the users perceive and interact with your website. Selling a product that has demand or information that is valuable is of course essential. But it's rare to have a monopoly on products or information on the web. &lt;br&gt;&lt;br&gt;What differentiates websites is the customer experience.&amp;nbsp; &lt;br&gt;&lt;br&gt;Most people would agree that the customer experience is important, but what specifically about the experience helps or hinders a website?&amp;nbsp; &lt;br&gt;&lt;br&gt;I examined the customer experience research in the Marketing and Usability literature and found some consistent themes. &lt;span style="font-weight: bold;"&gt;A successful website needs to be usable, credible and visually appealing&lt;/span&gt;. This will generate positive word of mouth, repeat visitors and ultimately a more successful website. &lt;br&gt;&lt;br&gt;The trick is effectively measuring these concepts.&lt;br&gt;&lt;br&gt;&lt;h2&gt;Measuring the Website Customer Experience&lt;/h2&gt;When people think of measuring website effectiveness they often think of analytics like click-through rates, purchases, bounce rates and time on site. These are important, but aren't giving us the whole customer experience picture. An effective way to know if users trust your website, think its usable and visually appealing is to ask them. &lt;br&gt;&lt;br&gt;Unfortunately it's not as simple as just asking: "Is this website usable?" or "Do you trust us?"&amp;nbsp; There is a science to asking the right question in a way that generates reliable and valid conclusions. The process is called psychometric validation. It involves identifying different ways of asking users about the constructs of interest then refining statements (called items) to identify the ones that best discriminate good websites from bad. &lt;br&gt;&lt;br&gt;There are many different questionnaires generating hundreds of items with different rating scales that measure different aspects of website usability, credibility, loyalty and appearance. &lt;br&gt;&lt;br&gt;I picked 75 candidate items and asked several hundred users to respond to them regarding their recent usage of several websites. I then narrowed the list of items down to around 20 which tended to have the best internal reliability and ability to discriminate between good and bad websites. Finally the top 13 items were selected based on how well they clustered together in a Factor Analysis around the construct they were intended to measure: usability, credibility, loyalty and appearance. &lt;br&gt;&lt;br&gt;&lt;img style="width: 234px; height: 148px;" src="http://www.suprq.com/images/suprq-logo.png"&gt;&lt;br&gt;&lt;br&gt;The 13 items together create a new standardized questionnaire called the SUPR-Q. It stands for the &lt;a href="http://www.suprq.com"&gt;Standardized Universal Percentile Rank-Questionnaire&lt;/a&gt;. Here are the four essential elements that make for a successfully website and how the 13 SUPR-Q items measure them.&lt;br&gt;&lt;h3&gt;Usability&lt;/h3&gt;Once you have a product or service that people want (utility), few things matter more than usability. Maybe the website offers a stellar service and looks really slick. If users can't accomplish what they want to do, find the product they're looking for, or complete their purchase--it's like it doesn't even exist. Especially for eCommerce websites, a usable experience means a profitable experience.&lt;br&gt;&lt;br&gt;The construct of website usability is measured by having users state their level of agreement to these four items.&lt;br&gt;&lt;br&gt;1.&amp;nbsp;&amp;nbsp; &amp;nbsp;This website is easy to use.&lt;br&gt;2.&amp;nbsp;&amp;nbsp; &amp;nbsp;I am able to find what I need quickly on this website.&lt;br&gt;3.&amp;nbsp;&amp;nbsp; &amp;nbsp;I enjoy using the website.&lt;br&gt;4.&amp;nbsp;&amp;nbsp; &amp;nbsp;It is easy to navigate within the website.&lt;br&gt;&lt;br&gt;These four items account for 95% of the 10 item &lt;a href="http://www.measuringusability.com/sus.php"&gt;System Usability Scale&lt;/a&gt; (SUS) and provide an excellent measure of concurrent validity.&amp;nbsp; That means they provide a reliable measure of usability, specific to websites, more efficiently than SUS. &lt;br&gt;&lt;br&gt;You'll notice that two of the items specifically reference findability and navigation—salient attributes of a usable website experience.&amp;nbsp; Item 3 taps into a measure of&lt;a href="http://www.nigelbevan.com/papers/Extending_Quality_in_Use.pdf"&gt; hedonic quality&lt;/a&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;[pdf]&lt;/span&gt;. It's grouped with three more traditional usability items because it appears users tend to have similarly responses to usability and enjoyment. More technically, it means these items all load on the same factor as uncovered in a Varimax Rotated Factor Analysis.&lt;br&gt;&lt;br&gt;&lt;h3&gt;Credibility (Trust, Value &amp;amp; Comfort)&lt;/h3&gt;Does the website sell products and collect credit card information? Are you gathering email addresses to build a subscriber base? If users don't trust your website, which for many companies is synonymous with their company and brand, they won't give up their information and website growth is impeded. &lt;br&gt;&lt;br&gt;These five items measure the construct of credibility--which touches on aspects of trust, value, comfort and confidence.&lt;br&gt;&lt;br&gt;5.&amp;nbsp;&amp;nbsp; &amp;nbsp;I feel comfortable purchasing from this website. &lt;br&gt;6.&amp;nbsp;&amp;nbsp; &amp;nbsp;This website keeps the promises it makes to me. &lt;br&gt;7.&amp;nbsp;&amp;nbsp; &amp;nbsp;I can count on the information I get on this website. &lt;br&gt;8.&amp;nbsp;&amp;nbsp; &amp;nbsp;I feel confident conducting business with this website. &lt;br&gt;9.&amp;nbsp;&amp;nbsp; &amp;nbsp;The information on this website is valuable.&lt;br&gt;&lt;br&gt;&lt;h3&gt;Loyalty&lt;/h3&gt;Are users talking about your website favorably or are they telling their friends to avoid it like the plague? Will they return to the website and purchase more things or at least see what you have to say? These two items touch upon repeat usage from existing customers and net-new usage from new customers.&lt;br&gt;&lt;br&gt;10.&amp;nbsp;&amp;nbsp; &amp;nbsp;How likely are you to recommend this website to a friend or colleague? (This is the same question used in the Net Promoter Score).&lt;br&gt;11.&amp;nbsp;&amp;nbsp; &amp;nbsp;I will likely visit this website in the future.&lt;br&gt;&lt;h3&gt;Appearance&lt;/h3&gt;Is your website looking circa 1998 and is the appearance hindering the experience? Users form impressions of your website based on the appearance in just a few seconds.&lt;br&gt;&lt;br&gt;12.&amp;nbsp;&amp;nbsp; &amp;nbsp;I find the website to be attractive. &lt;br&gt;13.&amp;nbsp;&amp;nbsp; &amp;nbsp;The website has a clean and simple presentation.&lt;br&gt;&lt;br&gt;&lt;h2&gt;Scoring&lt;/h2&gt;All the items (except #10) use a five point scale from strongly disagree to strongly agree. Item #10 uses the 11 point format familiar to those who ask it as the Net Promoter question. &lt;br&gt;&lt;img src="http://www.suprq.com/images/responses.png"&gt;&lt;br&gt;&lt;br&gt;&lt;img src="http://www.suprq.com/images/nps-response.png"&gt;&lt;br&gt;&lt;font size="2"&gt;&lt;span style="font-weight: bold;"&gt;Figure 1&lt;/span&gt;: The SUPR-Q's 13 items. All but item #10 are presented using a five point agreement scale.The name of the website can be used instead of "this website."&amp;nbsp; &lt;/font&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;h2&gt;Using the SUPR-Q&lt;/h2&gt;After the initial validation phase I commissioned several studies to gather data on websites. I surveyed current customers of hundreds of websites to compile a database of 4500 responses between summer 2010 and summer 2011. &lt;br&gt;&lt;br&gt;The websites come from 18 industries including: Travel, Airlines, Wireless Carriers,&amp;nbsp; Retail, News/Information, Government and Automotive websites.&amp;nbsp; It contains a spectrum from highly usable and trustworthy to difficult   to use and utterly chaotic websites.&amp;nbsp;&amp;nbsp; &lt;a href="http://www.suprq.com/"&gt;See more details on the SUPR-Q&lt;/a&gt; and a list of more of the websites in the database.&lt;br&gt;&lt;br&gt;The SUPR-Q can be administered after a usability test or to current users of a website retrospectively. In total there are over 200 websites in the SUPR-Q database.&amp;nbsp; Each website&amp;nbsp; contains data from between 30 and 400 users. &lt;br&gt;&lt;br&gt;In addition to having a reliable and valid instrument for measuring websites, the database behind the SUPR-Q provides a relative percentile rank (this is what gives the questionnaire its name).&amp;nbsp; Instead of working with a raw mean, the global score and each component score are normalized against the database of websites and presented as a percentage. &lt;br&gt;&lt;br&gt;This way, a score of 75% means the website scores higher than 75% of the websites in the database. Because I've commissioned the studies and own the data I can also reveal how each website scores on all the attributes.&amp;nbsp; It provides for an interesting analysis and provides many &lt;a href="http://www.measuringusability.com/blog/compared-what.php"&gt;meaningful comparisons&lt;/a&gt; for any website across many industries. &lt;br&gt;&lt;br&gt;For example, Facebook tends to score highly on loyalty (in the 85th percentile) but very poorly on trust (in the 10th percentile). Zappos, Apple and Amazon have the highest percentile ranks (the highest combination of all factors) while state government website and restaurants are at the bottom.&lt;br&gt;&lt;br&gt;The database includes many well known brands, some lesser known ones and is refreshed on an annual basis so changes are reflected. For example, Netflix was measured in Q1 of 2011 and has one of the highest SUPR-Q scores (including very high loyalty ratings--&lt;a href="http://consumerist.com/2011/02/netflix-tops-customer-loyalty-list.html"&gt;corroborating other data&lt;/a&gt;). This was before its &lt;a href="http://info.profilesinternational.com/profiles-employee-assessment-blog/bid/72384/Putting-Customer-Loyalty-to-the-Test-at-Netflix"&gt;infamous price increase&lt;/a&gt; which has no doubt affected its scores on loyalty and credibility. It will be interesting to see if the scores rebound when I survey the customers again in a few months.&lt;br&gt;&lt;br&gt;The process of scoring is easy. Collect the responses using any survey application (like &lt;a href="http://www.userzoom.com"&gt;Userzoom&lt;/a&gt; and &lt;a href="http://www.loop11.com/?ref=mu"&gt;Loop &lt;sup&gt;11&lt;/sup&gt;&lt;/a&gt; or SurveyMonkey) then enter the raw responses into the coded Excel spreadsheet. Immediately you get the normalized percentile rank showing where the website scores relative to the 200 others in the database. You can also see up to 100 individual website scores for each of the attributes of usability, credibility, trust and appearance and even filter by industry.&lt;br&gt;&lt;br&gt;The SUPR-Q is an efficient and valuable tool for benchmarking your website. It provides a sensitive and reliable measure in 13 easy to administer items. You can use the items freely on your next website evaluation (with attribution) but the real value comes from converting a raw score to a meaningful rank. More information can be found at &lt;a href="http://www.suprq.com/"&gt;www.SUPRQ.com&lt;/a&gt;. &lt;br&gt;&lt;br&gt;&lt;a href="http://www.measuringusability.com/contact.php"&gt;Contact me&lt;/a&gt; to find out more about licensing the SUPR-Q for your next evaluation.&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/qDRT3lq5gD8" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 04 Oct 2011 23:00:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/suprq.php</feedburner:origLink></item>
				<item>
				<title>Are women paid less than men in UX?</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/x7Vfbnp56Gs/ux-gender.php</link>
				<description>&lt;a href="http://www.flickr.com/photos/queereaster/351113224/sizes/m/in/photostream/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px; width: 224px; height: 181px;" src="http://www.measuringusability.com/images/gender-gap.jpg" title="Not that there's a contest" border="0"&gt;&lt;/a&gt;As part of the recent &lt;a href="http://www.measuringusability.com/ux-salary-2011.php"&gt;UPA Salary Survey&lt;/a&gt; I conducted a deep-dive into the nominal differences in salary between men and women. &lt;br&gt;&lt;br&gt;Most of the responses came from the US (70%) and the international currencies were converted into US dollars. &lt;br&gt;&lt;br&gt;&lt;h2&gt;&lt;br&gt;&lt;/h2&gt;&lt;h2&gt;&lt;br&gt;Men make around 4.4% more&lt;br&gt;&lt;/h2&gt;The first thing I looked at was the median salary in aggregate between genders. In total there were responses from 561 women and 552 men.&amp;nbsp; The graph below shows men reported making about $4k more than women (4.4% higher). &lt;br&gt;&lt;br&gt;&lt;img style="width: 427px; height: 279px;" src="http://www.measuringusability.com/images/ux-gender.jpg"&gt;&lt;br&gt;&lt;font style="font-weight: bold;" size="2"&gt;&amp;nbsp;Figure 1: Overall median salary differences between men and women. Yellow error bars are 95% confidence intervals.&lt;/font&gt;&lt;br&gt;&lt;br&gt;As with many socioeconomic variables like gender, ethnicity and age it's important to dig deeper to see what other variables may be confounding the relationship.&amp;nbsp; So please read on.&lt;br&gt;&amp;#8195;&lt;br&gt;&lt;h2&gt;Experience Matters&lt;/h2&gt;When we look at years of experience by gender an interesting pattern emerges.&amp;nbsp; At the lowest levels of experience, men and women are statistically indistinguishable. In fact, women are reporting slightly higher salary levels—between $4k and $8k more than men. &lt;br&gt;&lt;br&gt;&lt;img style="width: 530px; height: 406px;" src="http://www.measuringusability.com/images/salart-yrs.jpg"&gt;&lt;br&gt;&lt;font style="font-weight: bold;" size="2"&gt;&amp;nbsp;Figure 2: Median salary differences between men and women by years of experience. Yellow error bars are 95% confidence intervals.&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;/font&gt;This pattern holds until 8-10 years of experience where women suddenly report statistically lower salaries than men ($100k vs $88k).&amp;nbsp; This pattern holds for most of the higher years of experience--women tend to make less than men.&lt;br&gt;&lt;br&gt;&amp;nbsp;&lt;br&gt;&amp;#8195;&lt;br&gt;&lt;h2&gt;Age 36: The Salary Separator&lt;/h2&gt;Age and years of experience are closely related but not identical. You can be starting a second career at 40 (maybe you got tired of being an accountant) and have only 3 years of experience.&amp;nbsp;&amp;nbsp; When we look at salaries for men and women by age cohorts we notice a more salient pattern.&lt;br&gt;&lt;br&gt;&lt;img src="http://www.measuringusability.com/images/gender-age.jpg"&gt;&lt;br&gt;&lt;font style="font-weight: bold;" size="2"&gt;Figure 3: Median salary   differences between men and women by age. Yellow error   bars are 95% confidence intervals.&lt;br&gt;&lt;br&gt;&lt;/font&gt;Again at first we see men and women with roughly the same median salary--women are actually slightly out earning men (although not statistically). Starting at around age 36 the pattern reverses and holds—women's salaries in UX never catch up to men's. At the oldest age cohort (56-65), men make 26% more than women and the difference is statistically significant (notice the non-overlapping confidence intervals).&lt;br&gt;&amp;nbsp;&lt;br&gt;&lt;br&gt;When we look at the years of experience we see that men tend to have more years of experience for every age cohort. Again this different tends to really manifest itself above age 35. &lt;br&gt;&lt;br&gt;&lt;img style="width: 474px; height: 366px;" src="http://www.measuringusability.com/images/exp-age.jpg"&gt;&lt;br&gt;&lt;font style="font-weight: bold;" size="2"&gt;Figure 4: Years of experience by age cohort between men and women. Yellow error   bars are 95% confidence intervals.&lt;/font&gt;&lt;br&gt;&lt;br&gt;Starting in the 36-45 age cohort, men report about a year more experience than women. At the highest age cohort, men are reporting almost 4 more years of experience—28% more.&lt;br&gt;&lt;br&gt;&lt;h2&gt;Why the pay gap?&lt;/h2&gt;There are probably many reasons for the difference in pay above age 35. Rarely does an outcome in the social sciences &lt;a href="http://www.measuringusability.com/blog/fishbone.php"&gt;have a single cause&lt;/a&gt;. Here are some of the more likely candidates:&lt;br&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Women are discriminated against &lt;/span&gt;because of their gender.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;Women bear the bulk of the burden in having and &lt;span style="font-weight: bold;"&gt;raising children&lt;/span&gt; and tend to leave the workforce, reduce hours or not seek promotions.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Women are much less likely than men to ask for more money&lt;/span&gt; in salary negotiations: See the interesting discussion from Carnegie Mellon researchers Babcock and Laschever&amp;nbsp; in the book "&lt;a href="http://www.amazon.com/gp/product/0553383876/ref=as_li_ss_tl?ie=UTF8&amp;amp;tag=meausallc-20&amp;amp;linkCode=as2&amp;amp;camp=217145&amp;amp;creative=399369&amp;amp;creativeASIN=0553383876"&gt;Women Don't Ask&lt;/a&gt;" Even women MBA grads who took courses in negotiation were much less likely to ask for more than their male classmates.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Men tend to overstate their salaries&lt;/span&gt; in these surveys more than women: Yes, normally we never exaggerate, but maybe sometimes some of us do?&lt;/li&gt;&lt;/ol&gt;&lt;br&gt;I ran a factorial ANOVA with salary as the dependent variable and with gender, age and experience as explanatory variables.&amp;nbsp; Gender alone was not a significant predictor, but it was in combination with years of experience and age.&amp;nbsp; This is another corroborating point for what we see in the graphs above (the ANOVA output can be seen in the &lt;a href="http://www.usabilityprofessionals.org/usability_resources/surveys/SalarySurveys.html"&gt;full UPA report&lt;/a&gt;).&lt;br&gt;&lt;br&gt;Women in UX tend to have less experience than their male peers in the higher age cohorts and this indeed explains the bulk of the salary differences. This suggests factor #2 is playing a major role in explaining salary differences. It is also one of the easier ones to measure and explain, so I suspect it is some combination of all four factors above which explain the difference.&amp;nbsp; &lt;br&gt;&lt;br&gt;In other studies of &lt;a href="http://en.wikipedia.org/wiki/Gender_pay_gap"&gt;gender pay inequities&lt;/a&gt;, even when years of experience are accounted for, there are still other unexplained factors at play (e.g. #1, #3 and #4). We should take some comfort in these findings though. Even the overall gap of 4.4% when not accounting for age is a fraction of the 20%-30% gap seen in some professions. &lt;br&gt;&lt;br&gt;This is the sort of statistical analysis I provide when I design and collect survey data for companies.&amp;nbsp; If you've got a good business question and some data, &lt;a href="http://www.measuringusability.com/contact.php?svc=5"&gt;let's talk&lt;/a&gt;.&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/x7Vfbnp56Gs" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 13 Sep 2011 23:30:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/ux-gender.php</feedburner:origLink></item>
				<item>
				<title>How much are you worth? 2011 Salary Data for UX Professionals</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/BmNDMxo2KfA/ux-salary-2011.php</link>
				<description>&lt;a href="http://www.flickr.com/photos/thegrid-ch/5087679262/sizes/s/in/photostream/"&gt;&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/coins.jpg" border="0"&gt;&lt;/a&gt;In an economy like we've been having it's nice to have a job. &lt;br&gt;&lt;br&gt;However, certain fields like User Experience are in high-demand. &lt;br&gt;&lt;br&gt;The combination of technology and understanding users' needs are marketable skills.&amp;nbsp; This demand is reflected in the results of the &lt;a href="http://www.usabilityprofessionals.org/usability_resources/surveys/SalarySurveys.html"&gt;2011 salary survey from the Usability Professionals Association&lt;/a&gt;. &lt;br&gt;&lt;br&gt;This is the 3rd biennial survey I've crunched the numbers for and this year was just as interesting and showed similar patters as years past.&lt;br&gt;&lt;br&gt;&lt;h2&gt;2011 Salary Survey Results&lt;/h2&gt;Of the 1345 responses from 33 countries, most came from the US (70%) with a handful from the UK (7%) and Canada (4%).&amp;nbsp; All responses were converted into US Dollars prior to analysis.&lt;br&gt;&lt;br&gt;The &lt;span style="font-weight: bold;"&gt;median salary this year is $90k, &lt;/span&gt;which is up around $5,500 (7%) over the 2009 results (in constant 2010 dollars). This $90k is of course an average based on many variables, so it alone is a crude estimate of how your salary stacks up. &lt;br&gt;&lt;h3&gt;What factors affects Salaries?&lt;/h3&gt;A more accurate way of determining how much your skills are worth is to take into account the factors that affect salaries the most. While there are many variables, it turns out only a few have the largest impact.&amp;nbsp; The four things that influence your salary in the UX field the most are:&lt;br&gt;&lt;br&gt;&lt;ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Years of Experience&lt;/span&gt;: The amount of related UX experience alone is the biggest predictor of your salary. Knowing years of experience alone predicts around 32% of the variation in salaries. On average, each year of experience adds about $3600 to your salary--up to about 25 years of experience (so no octogenarian estimates).&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;PhD&lt;/span&gt;: Around 10% of UX professionals hold a PhD. If you have a PhD, give yourself a $14k raise. This bump is &lt;a href="http://www.measuringusability.com/usability-phd.php"&gt;about the same as it was two years ago&lt;/a&gt;. Masters degrees are the new Bachelors. Half of respondents hold a Masters yet there is no statistical salary advantage to this advanced degree.&lt;br&gt;&lt;br&gt; &lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Manager&lt;/span&gt;:&amp;nbsp; If you manage direct reports give yourself a $13k raise.&amp;nbsp; There's no data on whether being a manager makes you an easier target for layoffs but having to sit through all those planning and staffing meetings is worth something!&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;US West &amp;amp; Northeast:&amp;nbsp;&lt;/span&gt; The cost of living in California and the Northeastern US is higher than the rest of the US and it's reflected in higher salaries for these regions. In the survey, the city and state weren't collected but regions were. &lt;br&gt;&lt;br&gt;Living in Northeastern states &lt;font size="2"&gt;(Connecticut, Delaware, Maine, Maryland, Massachusetts, New Jersey, New Hampshire, New York, Pennsylvania, Rhode Island, Vermont, Washington DC) &lt;/font&gt;nets you an additional $16k per year.&amp;nbsp; Living in the West &lt;font size="2"&gt;(Alaska, Arizona, northern California, Colorado, Hawaii, Idaho, Montana, northern Nevada, Oregon, northern Utah, Washington, Wyoming) &lt;/font&gt;nets you on average an additional $26k more per year.&amp;nbsp; &lt;/li&gt;&lt;/ol&gt;&lt;br&gt;&lt;h2&gt;Salary Calculator&lt;/h2&gt;To estimate your salary based on the 2011 data, enter your information in the calculator below. Predictions will be most accurate for US based workers.&lt;br&gt;&lt;br&gt;&lt;form name="d"&gt;&lt;div class="sampleBoxLarge2"&gt;&lt;table border="0" cellpadding="4"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;table border="0"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;How many years of experience do you have?&lt;/td&gt;&lt;td&gt;&lt;input name="yrs" size="3" value="1" type="text"&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Are you a manager with direct reports?&lt;/td&gt;&lt;td&gt;&lt;select name="mgr"&gt;&lt;option value="0"&gt;No&lt;/option&gt;&lt;option value="1"&gt;Yes&lt;/option&gt;&lt;/select&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Do you have a PhD?&lt;/td&gt;&lt;td&gt; &lt;select name="phd"&gt;&lt;option value="0"&gt;No&lt;/option&gt;&lt;option value="1"&gt;Yes&lt;/option&gt;&lt;/select&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Do you live in the Western or Northeastern US?&lt;/td&gt;&lt;td&gt; &lt;select name="west"&gt;&lt;option value="0"&gt;No&lt;/option&gt;&lt;option value="1"&gt;West&lt;/option&gt;&lt;option value="2"&gt;NorthEast&lt;/option&gt;&lt;/select&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;td align="right"&gt;Your Estimated Salary is &lt;div id="d5" class="nobeakdiv"&gt;$51,023&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;input name="button" onclick="calcRegress()" value="Compute" type="button"&gt;&lt;/div&gt; &lt;/form&gt;&lt;br&gt;&lt;ul&gt;&lt;li&gt;For example, a good estimate of a starting salary for an individual   contributor with no experience and holding a Masters and living in the   Midwest would be about $47k.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;  If you have a PhD in the San Francisco Bay area with 5 years of experience the predicated average salary is $106k.&lt;br&gt;&lt;br&gt;&lt;/li&gt;&lt;li&gt;  If you have a team of researchers and designers, hold a PhD with 10   years of experience and live in Palo Alto your estimated salary would be   $137K.&lt;/li&gt;&lt;/ul&gt;  &lt;br&gt;  Using multiple-regression, I found these four variables do account for about 40% of   the variation in salaries. While 40% may sound low, it has about the   same predictive ability as &lt;a href="http://professionals.collegeboard.com/profdownload/pdf/08-1718_RDRR_081017_Web.pdf"&gt;high-school grades and standardized   test-scores&lt;/a&gt;&lt;span style="color: rgb(204, 0, 0);"&gt;[pdf]&lt;/span&gt; have on college grades.&amp;nbsp; &lt;br&gt;&lt;br&gt;Predicting salaries is an inexact science like many activities in the   behavioral sciences. So other variables and individual differences will have a substantial effect on salaries. However, the predicted salaries are based on what UX professionals reported making. &lt;br&gt;&lt;br&gt;If you find your salary well below the predicted amount, it may be time to find a new job, ask for a raise or start that new consultancy. If your salary is well above the predicted amount, congratulations--you're probably earning every penny!&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/BmNDMxo2KfA" height="1" width="1"/&gt;</description>
				<pubDate>Tue, 23 Aug 2011 23:30:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/ux-salary-2011.php</feedburner:origLink></item>
				<item>
				<title>Usability and Net Promoter Benchmarks for Consumer Software</title>
				<link>http://feedproxy.google.com/~r/MeasuringUsability/~3/NWc61kq1s8I/software-benchmarks.php</link>
				<description>&lt;img style="float: left; padding: 10px; margin-right: 20px;" src="http://www.measuringusability.com/images/sw-benchmark-logos.jpg" border="0"&gt;Many software companies track and use the Net Promoter Score as a gauge of customer loyalty. &lt;br&gt;&lt;br&gt;Positive word of mouth is a critical driver of future growth. If you have a usable product, customers will tell their friends about the positive experience. &lt;br&gt;&lt;br&gt;And alternatively, a poor user experience will lead customers to tell their friends how unusable a product is. But what are good Net Promoter and Usability Scores?&lt;br&gt;&lt;br&gt;&lt;h2&gt;Consumer &amp;amp; Productivity Software Benchmark Survey&lt;/h2&gt;Over the past six months I conducted the largest survey of attitudes about usability and loyalty for the consumer and productivity software industry. I received 1726 responses from current users of 17 of the best known names in software. The products are:&lt;br&gt;&lt;ul&gt;&lt;li&gt;ACT!&lt;/li&gt;&lt;li&gt;AutoCAD&lt;/li&gt;&lt;li&gt;Dreamweaver&lt;/li&gt;&lt;li&gt;Excel&lt;/li&gt;&lt;li&gt;Drop Box &lt;/li&gt;&lt;li&gt;Flash&lt;/li&gt;&lt;li&gt;iTunes&lt;/li&gt;&lt;li&gt;McAfee Anti-Virus &lt;/li&gt;&lt;li&gt;Mint.com&lt;/li&gt;&lt;li&gt;Norton Anti-Virus &lt;/li&gt;&lt;li&gt;Peachtree Accounting &lt;/li&gt;&lt;li&gt;Photoshop&lt;/li&gt;&lt;li&gt;PowerPoint&lt;/li&gt;&lt;li&gt;QuickBooks&lt;/li&gt;&lt;li&gt;Quicken&lt;/li&gt;&lt;li&gt;TurboTax&lt;/li&gt;&lt;li&gt;Word&lt;/li&gt;&lt;/ul&gt;Users came from 75 different countries with the bulk from North America   (73%) , Europe (14%) , Asia (10%), South America (2%) and Africa (1%). A   bit over half were Male (57%) with an average age of 32. These users   were asked a number of loyalty and standardized usability questions   including the Likelihood to Recommend question and the 10 item &lt;a href="http://www.measuringusability.com/sus.php"&gt;System   Usability Scale&lt;/a&gt; (SUS).&lt;h2&gt;What's a good Net Promoter Score? &lt;/h2&gt;The Net Promoter Score is calculated using the 11-point Likelihood to recommend question (0 to 10). It is &lt;a href="http://www.measuringusability.com/blog/top-box.php"&gt;computed &lt;/a&gt;by subtracting the percent of Detractors (0-6) from the percent of Promoters (9-10).&amp;nbsp;&amp;nbsp; &lt;br&gt;&lt;br&gt;Across all 17 products &lt;span style="font-weight: bold;"&gt;the average Net Promoter Score is a 21% with a range of -26% to 56%&lt;/span&gt;.&amp;nbsp; TurboTax gets the award for the highest Net Promoter Score. A full list of product Net Promoter Scores can be purchased in the &lt;a href="http://www.measuringusability.com/products/cswReport"&gt;detailed benchmark report.&lt;/a&gt;&lt;br&gt;&lt;h3&gt;The importance of product-level benchmarks&lt;/h3&gt;The only other Net Promoter Benchmarks that exist are at the &lt;a href="http://www.satmetrix.com/net-promoter/benchmark-reports/b2c-industry-benchmarks/consumer-software2011/"&gt;company level&lt;/a&gt;. While this is important and helpful for marketing and branding efforts, it's difficult to isolate how much a product or group of products are contributing to loyalty. &lt;br&gt;&lt;br&gt;Net Promoter Scores can vary substantially within the same company. While the products in the Microsoft Office Suite varied by only 7 percentage points, some products from Intuit differed by a substantial 50 percentage points.&amp;nbsp; Isolating your benchmark to the relevant product allows you to hone in on what needs improving, especially in cases where the consumer may recognize the product but not the company that makes it.&lt;br&gt;&lt;br&gt;&lt;h3&gt;Did you recommend (Retro-Recommend Rate)&lt;/h3&gt;In addition to asking customers how likely they are to recommend (in the future) I asked whether they actually recommended in the past 12 months. Humans are notorious for being poor predictors of their future behavior, so it's a nice supplement to the Likelihood to Recommend question.&lt;br&gt;&amp;nbsp; &lt;br&gt;Across all products the &lt;span style="font-weight: bold;"&gt;average retro-recommend rate was 45%&lt;/span&gt;. Meaning for the average product, around half the users said they did refer a friend or a colleague to the product.&amp;nbsp; This did vary across products. Drop Box has 72% of current customer reporting a retro-recommendation whereas Norton Anti-Virus has 29%.&lt;br&gt;&lt;br&gt;&lt;h3&gt;Product Referral Rates&lt;/h3&gt;Another great measure of customer loyalty is to assess what percent of customers were themselves recommended to the product (a referral-rate). Again, this is another way to gauge word of mouth as a supplement to the retro recommend rate and Likelihood to Recommend rate.&amp;nbsp; &lt;br&gt;&lt;br&gt;Across all products the &lt;span style="font-weight: bold;"&gt;average referral rate was a 47%&lt;/span&gt;. Meaning for the average product, around half the users said they were referred by a friend or a colleague to the product.&amp;nbsp; The referrals for a product come from a high of 70% for PeachTree Accounting and the fewest for Microsoft Word at 32%.&lt;br&gt;&lt;br&gt;&lt;h2&gt;What's a good Usability Score ?&lt;/h2&gt;I used the industry standard System Usability Scale (SUS) to compute the perceived ease of use of the 17 products.&amp;nbsp; SUS is a 10 item questionnaire with possible scores ranging from 0 to 100. The average SUS score from over 500 products is a 68. &lt;span style="font-weight: bold;"&gt;The average SUS score from this group is a 73 &lt;/span&gt;with a minimum score of 63 and high score of 84. &lt;br&gt;&lt;br&gt;I converted the raw SUS scores to percentile ranks and found that the average score translates into a 63%--meaning this group of products has higher perceived usability than 63% of all products tested. &lt;br&gt;&lt;br&gt;This makes sense. We'd expect a wide selection of highly used software products to be easier to use than say less used and complex business to business software. The lowest and highest score translate into percentile ranks of 36% and 86% respectively. &lt;br&gt;&lt;br&gt;&lt;h3&gt;Learnability &lt;/h3&gt;Items 4 and 10 from the System Usability Scale provide a measure of learnability. iTunes, TurboTax and Mint.com lead the way for the easiest products to learn. Not surprisingly, the products that time to master (and often require special skills) have the lowest learnability scores : Photoshop, Dreamweaver and AutoCAD.&lt;br&gt;&lt;br&gt;Full SUS scores and learnability scores by product are provided in the &lt;a href="http://www.measuringusability.com/products/cswReport"&gt;benchmark report.&lt;/a&gt;&lt;br&gt;&lt;br&gt;&lt;h3&gt;Usability and Net Promoter&lt;/h3&gt;&lt;a href="http://www.measuringusability.com/usability-loyalty.php"&gt;As I reported last year&lt;/a&gt;, there is a strong association between usability and loyalty. In general, if a product has a higher SUS score, people are more likely recommending it.&amp;nbsp; The graph below shows the SUS scores for detractors, passives and promoters.&amp;nbsp;&amp;nbsp; &lt;br&gt;&lt;br&gt;&lt;img src="http://www.measuringusability.com/images/sus-nps-segments.jpg"&gt;&lt;br&gt;&lt;br&gt;This is consistent with last year's data and shows that SUS scores above 80 or so go along with positive promotion. SUS scores below 60 represent a less usable experience and lead to more negative word of mouth. &lt;br&gt;&lt;br&gt;&lt;h3&gt;Key Drivers of Customer Loyalty&lt;/h3&gt;In addition to usability, I also asked questions about how customers perceive the value they get for the price and the quality of the product.&amp;nbsp; I then used these questions to help explain the variation in Net Promoter Scores by creating a key-driver analysis. This is essentially multiple regression analysis crossed with average responses. It gives a two dimensional look of both importance and the level of satisfaction with the three attributes of usability, quality and value.&amp;nbsp; For example, in the key-driver chart below I've got TurboTax and Norton Anti-Virus. &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;img src="http://www.measuringusability.com/images/sample-key-driver.jpg"&gt;&lt;br&gt;&lt;br&gt;We can see that TurboTax perceived Quality has a .5 importance rating   (this is the coefficient from the regression analysis). This means a 1   point increase in perceived quality (on the same 0-10 scale) would   increase the likelihood to recommend by .5 of a point.&amp;nbsp; Put another way,   it takes a 2 point increase or decrease in Quality to move the LTR   score 1 point.&lt;br&gt;&lt;br&gt;The ease of use rating for TurboTax is right around a .25. We can interpret this to mean that the quality of TurboTax is seen as about twice as important as its ease of use.&amp;nbsp; Increases or decreases in quality would have a more substantial impact on the Net Promoter Score for this product than the value (price paid) and ease of use. In other words, users care most about getting taxes filed accurately (quality),&amp;nbsp; somewhat less about how easy to use it is, and to a lesser extent they care about the price of the product. &lt;br&gt;&lt;br&gt;The story for Norton Anti-virus is a bit different. Here again we see quality is very important but the current rating suggest users are concerned about it.&amp;nbsp; Ease of use doesn't appear to impact attitudes about loyalty. While value has about the same importance as TurboTax's value, users perceive the value of Norton as substantially lower--suggesting they don't think it's a good value for the price. A good place to start improving would be quality where a 1 point increase in the Quality score would result in a .9 point increase in the Likelihood to Recommend score (and therefore increase the Net Promoter Score). &lt;br&gt;&lt;br&gt;A deeper dive on why users are rating the quality lower would identify areas of improvement. An alternate and possibly easier strategy may be to lower the price or explore better pricing options to deliver value.&amp;nbsp; However, the value ratings of McAffee Anti Virus (not-shown) have even lower value ratings and higher importance scores. It may be that customers don't like paying much if anything for anti-virus software relative to other software products. In the industry, Norton may be priced competitively.&amp;nbsp; A key driver analysis for all products and attributes is included in the &lt;a href="http://www.measuringusability.com/products/cswReport"&gt;benchmark report.&lt;/a&gt;&lt;br&gt;&lt;br&gt;Third party benchmarks for loyalty and usability are an excellent way to provide meaning to all those numbers on corporate dashboards. It's a lot easier to know how well your product is doing if you know where it stands relative to the competition or the industry average. &lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;img src="http://feeds.feedburner.com/~r/MeasuringUsability/~4/NWc61kq1s8I" height="1" width="1"/&gt;</description>
				<pubDate>Wed, 08 Jun 2011 01:00:00 -0600</pubDate>
				<feedburner:origLink>http://www.measuringusability.com/software-benchmarks.php</feedburner:origLink></item>
		</channel>
		</rss>
