Changes

Copied from Akahele.org
Those old enough to remember the Carter and Reagan administrations are likely to have enjoyed the highly popular game show, <a title="Family Feud (funny clips)" href="http://www.youtube.com/watch?v=_oxt9e5B4bE" target="_blank"><em>Family Feud</em></a>, if not for the spectacle of two extended families competing against each other, then for the "play along at home" aspect of matching wits with those families, or (if anything) counting to see how many times host Richard Dawson would plant a (too often unwelcome) kiss (or two) on the lips of any female contestant.

<strong>A survey we trusted</strong>

The most intellectually viable aspect of <em>Family Feud</em> was the core of the program -- the response data from a survey of 100 people answering questions that tend to cluster common answers: "Name something you buy on every visit to the grocery store" or "Give a slang term for a policeman".
<table style="float: right; border=" border="0" width="155">
<tbody>
<tr>
<td><img src="http://akahele.org/wp-content/uploads/2009/05/richard-dawson.jpg" alt="Richard Dawson on Family Feud" /></td>
</tr>
<tr>
<td class="photocaption" style="text-align: left;">Richard Dawson on Family Feud</td>
</tr>
</tbody></table>
As a practitioner in the field of marketing research, I know darn well that a sample of 100 respondents (<a title="Family Feud survey panel theory" href="http://sg.answers.yahoo.com/question/index;_ylt=Ana0kDXsKamqprcApIm6RfQh4wt.;_ylv=3?qid=20060606190510AAzZnok" target="_blank">heaven knows</a> how they were selected for participation in the survey) is practically bunk. But the methodology seemed to work out just fine for a family game show. There were never any scandals or disputes centered on the answers to that survey. We knew we were about to come face to face with a reliable-enough "fact" when Dawson would turn to that big board behind him and shout, "Survey says...!"

Today, in the world of overnight web-panel-based consumer data collection, I'm not nearly as comfortable as I was at a young age with trusty Richard Dawson and his big, flashing incandescent board on <em>Family Feud</em>.

<strong>My experience with Internet surveys
</strong>

I'm hardly new to the practice of conducting survey research via the Internet. In fact, e-mail borne surveys were an important part of my business practice as far back as 1993 -- respondents would "edit" the reply e-mail text with their answers, send it back, and the software would detect the answers within the confines of pre-formatted response spaces within the e-mail text. Crude in retrospect, but these techniques worked fairly well, especially when targeting a highly selective sample (such as the customer list of a business-class laser printer manufacturer).

About four or five years later, true web-based survey platforms were well established, but how to populate these questionnaires with <a title="Probability sampling" href="http://www.socialresearchmethods.net/kb/sampprob.php" target="_blank">representative, diverse respondents</a> was becoming a hot potato. Everyone seemed to acknowledge that web panels attracted non-typical consumers, but the low cost of execution and speed of turn-around were just so damn tempting. Of course, the major web panel vendors did their best to come up with various techniques (and white papers) that demonstrated ways to "balance" web samples, so that they might pass muster with executives on the client side. But, remaining at the crux of all survey research and not just web-based sampling, is the question of self-selection bias. People who willingly spend 15 minutes of their time to complete a questionnaire are not "normal", in the sense that they sometimes fail to represent the attitudes and behaviors of people who prefer not to spend their time that way. It appears that, simply, this problem is accentuated among Internet populations.

<strong>Losing faith
</strong>

Between about 2001 and the present day, I've gradually been losing faith in the entire premise of reliable Internet-sampled and Internet-fielded marketing research. Last month, a presentation at the <a title="CTAM Research Conference" href="http://www.ctam.com/conferences/Research/index.html" target="_blank">CTAM Research Conference</a> in Washington, DC, practically sealed the deal for me. <a title="Mktg, Inc." href="http://www.mktginc.com/ourteam.asp" target="_blank">Dr. Steven Gittelman</a> conducted a meta audit of 17 different U.S. web panels. His research found that on nine of these panels, well over 15% of the participants were completing more than thirty Internet surveys per month. Furthermore, on most U.S. panels, anywhere from 40% to 55% of members are also enrolled in at least <strong>four other</strong> survey research panels!
<div>

<strong>Things that make you go, "Hmm..."
</strong>

My research team recently fielded a quick online survey with a San Diego vendor I implicitly trust to have one of the best panels in the online research business. The sampling was intended to be nationally representative of Internet households who had either cut wire-line telephone service in the past 12 months, or were strongly intending to do so in the next 12 months, and guess what? It’s rather clear that a lot of respondents weren’t paying attention by the end of the survey: nearly 32% of the respondents said they were Hispanic or Latino. There is no way that's a true statistic, especially considering how Hispanics under-index for Internet penetration and English fluency.

Granted, some of this particular over-reporting was due to the way the question was asked (in a format usually intended for a telephone survey, where I’m sure the live interviewer does a better job of getting the right answer):

<span style="color: #008000;"><em>To ensure proper ethnic representation, please answer; are you of Hispanic or Latino ethnicity or background?</em></span>
<div><span style="color: #008000;"><em>1 Yes (white Hispanic)
2 Yes (non-white Hispanic)
3 No
R Prefer not to say</em></span></div>
My guess is that a significant number of white non-Hispanics and black non-Hispanics selected punch 1 and punch 2, semi-consciously reacting to the words “white” and “non-white” to inform their response, rather than the question text itself.

In another recent study, we sampled digital cable customers who subscribe to a monthly DVD rental service. The hyper-inflated findings about this sample concluded:
<div>
<ul>
<li><span style="font-size: small;">More than 85% said they subscribe to high-definition television programming</span></li>
<li><span style="font-size: small;">56% said they have a home theater</span></li>
<li><span style="font-size: small;">Over 71% said they have either a video game device or a DVD player connected to the Internet</span></li>
<li><span style="font-size: small;">Even more (72%) said they use a media center PC to watch video on their TV set</span></li>
</ul>
</div>
<div>

Yeah, right. Maybe if the respondents are time travelers, reporting back to us their household characteristics from the year 2019. Why do we tolerate "findings" like these? In a word, because the data can be collected quickly and cost-efficiently, and (thankfully) these behavioral measures were not a key objective of what was essentially an attitudinal survey.</div>
<strong>Setting the trap
</strong>

Over the past year, I have taken to using a simple technique to "trap" respondents who are not paying attention to (or lying about) survey questions. By adding "tripwire" questions to the beginning of a survey, I am able to diagnose respondents who are more likely blithely clicking check-boxes ("<a title="Jon Krosnick on satisficing in surveys" href="http://www3.interscience.wiley.com/journal/112415330/abstract?CRETRY=1&amp;SRETRY=0" target="_blank">satisficing</a>" a questionnaire) than actually paying attention. I provide a list of relatively uncommon products or experiences, then terminate from the survey anyone who answers that an <em>extremely</em> unlikely number of these items apply to them -- that is, it's far more likely the respondent is lazily or deceptively completing the questionnaire than it is that they are attentively and truthfully responding. Some examples may help illustrate the principle.

In a recent survey, I asked which of the following items were in the respondent's home, and these were the results:
<table style="border-collapse: collapse; width: 241pt;" border="0" cellspacing="0" cellpadding="0" width="321"><col style="width: 193pt;" width="257"></col> <col style="width: 48pt;" width="64"></col>
<tbody>
<tr style="height: 13.5pt;" height="18">
<td class="xl26" style="height: 13.5pt; width: 193pt;" width="257" height="18"><strong>PRESENT IN HOUSEHOLD</strong></td>
<td class="xl26" style="border-left: medium none; width: 48pt;" width="64"><strong>N=3258</strong></td>
</tr>
<tr style="height: 13.5pt;" height="18">
<td class="xl24" style="height: 13.5pt;" height="18">Carbon monoxide detector</td>
<td class="xl25" style="border-color: -moz-use-text-color black -moz-use-text-color -moz-use-text-color; border-left: medium none;" align="right">37.3%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl22" style="border-top: medium none; height: 12.75pt;" height="17">Bread-making machine</td>
<td class="xl23" style="border-top: medium none; border-left: medium none;" align="right">24.8%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl22" style="border-top: medium none; height: 12.75pt;" height="17">Installed home security system</td>
<td class="xl23" style="border-top: medium none; border-left: medium none;" align="right">22.0%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl22" style="border-top: medium none; height: 12.75pt;" height="17">Locked gun cabinet</td>
<td class="xl23" style="border-top: medium none; border-left: medium none;" align="right">11.8%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl22" style="border-top: medium none; height: 12.75pt;" height="17">Jet Ski / Sea Doo personal watercraft</td>
<td class="xl23" style="border-top: medium none; border-left: medium none;" align="right">2.8%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl22" style="border-top: medium none; height: 12.75pt;" height="17">Segway personal transporter</td>
<td class="xl23" style="border-top: medium none; border-left: medium none;" align="right">1.4%</td>
</tr>
</tbody></table>
We terminated the 160 individuals (5% of all candidates) who said that they had four or more of these items in their home. Even so, that still leaves at least one in five of the homes in our sample saying they have a bread-making machine. Is that even plausible?

There are about 114 million households in the United States. If 1.4% of them own a Segway, that means this particular web survey suggests there are about 1.6 million Segway units dispersed across America.
<table style="float: left; border=" border="0" width="120">
<tbody>
<tr>
<td><img src="http://akahele.org/wp-content/uploads/2009/05/segway.jpg" alt="Segway personal transporter" /></td>
</tr>
<tr>
<td class="photocaption" style="text-align: left;">One of the 1.6 million Segway owners?</td>
</tr>
</tbody></table>
Never mind that as of February 2007, only about <a title="Scientific American on Segway" href="http://www.scientificamerican.com/article.cfm?id=power-walker" target="_blank">24,000 Segway units</a> had ever been sold, and many of them to corporate and law enforcement clients, not residential households. So, we may choose between lazy and/or lying survey respondents (1.6 million), or we have realistic transactional data to guide us (24,000).

Do you see my frustration with web-based data collection?

Here is another example, where we simply terminated anyone who answered "yes" to four or more of a list of items. In this study, we targeted adult householders in our market footprint (which covers about 40% of the nation), with at least a working television set, and we asked 504 possible respondents about their participation in the past 3 months in any of the following:
<table style="border-collapse: collapse; width: 241pt;" border="0" cellspacing="0" cellpadding="0" width="321"><col style="width: 193pt;" width="257"></col> <col style="width: 48pt;" width="64"></col>
<tbody>
<tr style="height: 13.5pt;" height="18">
<td class="xl28" style="border-color: -moz-use-text-color black -moz-use-text-color -moz-use-text-color; height: 13.5pt; width: 193pt;" width="257" height="18"><strong>PARTICIPATION LAST 3 MONTHS</strong></td>
<td class="xl29" style="border-left: medium none; width: 48pt;" width="64"><strong>N=504</strong></td>
</tr>
<tr style="height: 13.5pt;" height="18">
<td class="xl27" style="height: 13.5pt;" height="18">Collected unemployment check</td>
<td class="xl25" style="border-left: medium none;" align="right">9.7%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl26" style="border-top: medium none; height: 12.75pt;" height="17">Stayed in a Ramada Inn</td>
<td class="xl24" style="border-top: medium none; border-left: medium none;" align="right">3.0%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl26" style="border-top: medium none; height: 12.75pt;" height="17">Coached a youth baseball or soccer game</td>
<td class="xl24" style="border-top: medium none; border-left: medium none;" align="right">2.4%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl26" style="border-top: medium none; height: 12.75pt;" height="17">Participated in bowling league</td>
<td class="xl24" style="border-top: medium none; border-left: medium none;" align="right">2.4%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl26" style="border-top: medium none; height: 12.75pt;" height="17">Played duplicate bridge</td>
<td class="xl24" style="border-top: medium none; border-left: medium none;" align="right">1.0%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl26" style="border-top: medium none; height: 12.75pt;" height="17">Traveled to Africa</td>
<td class="xl24" style="border-top: medium none; border-left: medium none;" align="right">0.6%</td>
</tr>
<tr style="height: 12.75pt;" height="17">
<td class="xl26" style="border-top: medium none; height: 12.75pt;" height="17">Traveled to Australia</td>
<td class="xl24" style="border-top: medium none; border-left: medium none;" align="right">0.2%</td>
</tr>
</tbody></table>
On this panel, we terminated any who affirmed at least 4 of these items -- a near impossibility. What is the likelihood, for example, of a person selected at random who is on unemployment, stayed in a Ramada Inn, rolls in a bowling league, and coaches a youth baseball or soccer team? But, we "caught" four such respondents out of 504. This nearly impossible configuration would pro-rate to being true for about 1,785,700 Americans. That is, 4 divided by 504, times about 225,000,000 adults.

This same data shows that 2.4% of adults are in a bowling league within the past three months, or 5.4 million adults. This is about two times the known count of adults <em>and</em> children (combined) participating annually in a bowling league, <a title="2.3 million league bowlers" href="http://www.bowl.com/usbowler/about.aspx" target="_blank">according to the USBC</a>. From corporate reports, I estimate that Ramada has about 50,000 rooms in the United States. Over three months, that's about 4.5 million room-nights possible. According to the above survey screener, 6.7 million adults have stayed in a Ramada room at some point in the past 3 months. Even with 2 adults per room, that's an amazing occupancy rate -- Monday through Sunday, every week of the past three months, if we are to believe this sample. I conclude that we cannot believe the sample. The duplicate bridge stat is interesting -- web panels skew younger, and bridge skews older. According to the ACBL, there are about 11 million people in the U.S. who play <a title="ACBL study (1986)" href="http://homepage.mac.com/bridgeguys/pdf/Newspaper/RecreationSpecialization.pdf" target="_blank">contract bridge</a>. According to our screener, though, it's only 2.25 million -- under-reported by a factor of perhaps five.

<strong>Can they pass the test?</strong>

When showing respondents a description of a new product or service concept (sometimes even with an informative video clip), we've taken to the habit of giving the respondents a short, three-question "true or false" quiz about the concept they've just read about (and/or watched). These are not very difficult questions for a sentient, attentive person of even less-than-average IQ to answer. Consistently, we are finding that between 20% and 35% of respondents will fail this quiz that immediately follows presentation of the concept. My conclusion: perhaps a third of web survey respondents aren't paying any attention to the communications we're putting before them in surveys.

<em>Akahele</em> is presenting you data, both anecdotal and quantitative, each and every week. What conclusions are you drawing about the key theme of <strong>trust </strong>and the<strong> Internet</strong>? We look forward to your joining us with personal comments below.

<strong>Image credits:</strong>
<ul>
<li><span style="color: #000000;">Richard Dawson (Mark Goodson-Bill Todman Productions), </span><span style="color: #000000;"><a title="Fair use" href="http://www.copyright.gov/title17/92chap1.html#107" target="_blank"><span class="comment">fair use doctrine</span></a>.</span></li>
<li><span style="color: #000000;">Segway personal transporter, </span><span style="color: #000000;"><a title="Fair use" href="http://www.copyright.gov/title17/92chap1.html#107" target="_blank"><span class="comment">fair use doctrine</span></a>.</span></li>
</ul>
</div>