Directory:Akahele/Survey says...

MyWikiBiz, Author Your Legacy — Sunday December 22, 2024
Jump to navigationJump to search

Survey says...

Those old enough to remember the Carter and Reagan administrations are likely to have enjoyed the highly popular game show, Family Feud, if not for the spectacle of two extended families competing against each other, then for the "play along at home" aspect of matching wits with those families, or (if anything) counting to see how many times host Richard Dawson would plant a (too often unwelcome) kiss (or two) on the lips of any female contestant.

A survey we trusted

The most intellectually viable aspect of Family Feud was the core of the program -- the response data from a survey of 100 people answering questions that tend to cluster common answers: "Name something you buy on every visit to the grocery store" or "Give a slang term for a policeman".

File:Richard-dawson.jpg
Richard Dawson on Family Feud

As a practitioner in the field of marketing research, I know darn well that a sample of 100 respondents (heaven knows how they were selected for participation in the survey) is practically bunk. But the methodology seemed to work out just fine for a family game show. There were never any scandals or disputes centered on the answers to that survey. We knew we were about to come face to face with a reliable-enough "fact" when Dawson would turn to that big board behind him and shout, "Survey says...!"

Today, in the world of overnight web-panel-based consumer data collection, I'm not nearly as comfortable as I was at a young age with trusty Richard Dawson and his big, flashing incandescent board on Family Feud.

My experience with Internet surveys

I'm hardly new to the practice of conducting survey research via the Internet. In fact, e-mail borne surveys were an important part of my business practice as far back as 1993 -- respondents would "edit" the reply e-mail text with their answers, send it back, and the software would detect the answers within the confines of pre-formatted response spaces within the e-mail text. Crude in retrospect, but these techniques worked fairly well, especially when targeting a highly selective sample (such as the customer list of a business-class laser printer manufacturer).

About four or five years later, true web-based survey platforms were well established, but how to populate these questionnaires with representative, diverse respondents was becoming a hot potato. Everyone seemed to acknowledge that web panels attracted non-typical consumers, but the low cost of execution and speed of turn-around were just so damn tempting. Of course, the major web panel vendors did their best to come up with various techniques (and white papers) that demonstrated ways to "balance" web samples, so that they might pass muster with executives on the client side. But, remaining at the crux of all survey research and not just web-based sampling, is the question of self-selection bias. People who willingly spend 15 minutes of their time to complete a questionnaire are not "normal", in the sense that they sometimes fail to represent the attitudes and behaviors of people who prefer not to spend their time that way. It appears that, simply, this problem is accentuated among Internet populations.

Losing faith

Between about 2001 and the present day, I've gradually been losing faith in the entire premise of reliable Internet-sampled and Internet-fielded marketing research. Last month, a presentation at the CTAM Research Conference in Washington, DC, practically sealed the deal for me. Dr. Steven Gittelman conducted a meta audit of 17 different U.S. web panels. His research found that on nine of these panels, well over 15% of the participants were completing more than thirty Internet surveys per month. Furthermore, on most U.S. panels, anywhere from 40% to 55% of members are also enrolled in at least four other survey research panels!

Things that make you go, "Hmm..."

My research team recently fielded a quick online survey with a San Diego vendor I implicitly trust to have one of the best panels in the online research business. The sampling was intended to be nationally representative of Internet households who had either cut wire-line telephone service in the past 12 months, or were strongly intending to do so in the next 12 months, and guess what? It’s rather clear that a lot of respondents weren’t paying attention by the end of the survey: nearly 32% of the respondents said they were Hispanic or Latino. There is no way that's a true statistic, especially considering how Hispanics under-index for Internet penetration and English fluency.

Granted, some of this particular over-reporting was due to the way the question was asked (in a format usually intended for a telephone survey, where I’m sure the live interviewer does a better job of getting the right answer):

To ensure proper ethnic representation, please answer; are you of Hispanic or Latino ethnicity or background?
1 Yes (white Hispanic)
2 Yes (non-white Hispanic)
3 No
R Prefer not to say

My guess is that a significant number of white non-Hispanics and black non-Hispanics selected punch 1 and punch 2, semi-consciously reacting to the words “white” and “non-white” to inform their response, rather than the question text itself.

In another recent study, we sampled digital cable customers who subscribe to a monthly DVD rental service. The hyper-inflated findings about this sample concluded:

  • More than 85% said they subscribe to high-definition television programming
  • 56% said they have a home theater
  • Over 71% said they have either a video game device or a DVD player connected to the Internet
  • Even more (72%) said they use a media center PC to watch video on their TV set
Yeah, right. Maybe if the respondents are time travelers, reporting back to us their household characteristics from the year 2019. Why do we tolerate "findings" like these? In a word, because the data can be collected quickly and cost-efficiently, and (thankfully) these behavioral measures were not a key objective of what was essentially an attitudinal survey.

Setting the trap

Over the past year, I have taken to using a simple technique to "trap" respondents who are not paying attention to (or lying about) survey questions. By adding "tripwire" questions to the beginning of a survey, I am able to diagnose respondents who are more likely blithely clicking check-boxes ("satisficing" a questionnaire) than actually paying attention. I provide a list of relatively uncommon products or experiences, then terminate from the survey anyone who answers that an extremely unlikely number of these items apply to them -- that is, it's far more likely the respondent is lazily or deceptively completing the questionnaire than it is that they are attentively and truthfully responding. Some examples may help illustrate the principle.

In a recent survey, I asked which of the following items were in the respondent's home, and these were the results:

<col style="width: 193pt;" width="257"></col> <col style="width: 48pt;" width="64"></col> <tbody> </tbody>
PRESENT IN HOUSEHOLD N=3258
Carbon monoxide detector 37.3%
Bread-making machine 24.8%
Installed home security system 22.0%
Locked gun cabinet 11.8%
Jet Ski / Sea Doo personal watercraft 2.8%
Segway personal transporter 1.4%

We terminated the 160 individuals (5% of all candidates) who said that they had four or more of these items in their home. Even so, that still leaves at least one in five of the homes in our sample saying they have a bread-making machine. Is that even plausible?

There are about 114 million households in the United States. If 1.4% of them own a Segway, that means this particular web survey suggests there are about 1.6 million Segway units dispersed across America.

<tbody> </tbody>
<img src="segway.jpg" alt="Segway personal transporter" />
One of the 1.6 million Segway owners?

Never mind that as of February 2007, only about 24,000 Segway units had ever been sold, and many of them to corporate and law enforcement clients, not residential households. So, we may choose between lazy and/or lying survey respondents (1.6 million), or we have realistic transactional data to guide us (24,000).

Do you see my frustration with web-based data collection?

Here is another example, where we simply terminated anyone who answered "yes" to four or more of a list of items. In this study, we targeted adult householders in our market footprint (which covers about 40% of the nation), with at least a working television set, and we asked 504 possible respondents about their participation in the past 3 months in any of the following:

<col style="width: 193pt;" width="257"></col> <col style="width: 48pt;" width="64"></col> <tbody> </tbody>
PARTICIPATION LAST 3 MONTHS N=504
Collected unemployment check 9.7%
Stayed in a Ramada Inn 3.0%
Coached a youth baseball or soccer game 2.4%
Participated in bowling league 2.4%
Played duplicate bridge 1.0%
Traveled to Africa 0.6%
Traveled to Australia 0.2%

On this panel, we terminated any who affirmed at least 4 of these items -- a near impossibility. What is the likelihood, for example, of a person selected at random who is on unemployment, stayed in a Ramada Inn, rolls in a bowling league, and coaches a youth baseball or soccer team? But, we "caught" four such respondents out of 504. This nearly impossible configuration would pro-rate to being true for about 1,785,700 Americans. That is, 4 divided by 504, times about 225,000,000 adults.

This same data shows that 2.4% of adults are in a bowling league within the past three months, or 5.4 million adults. This is about two times the known count of adults and children (combined) participating annually in a bowling league, according to the USBC. From corporate reports, I estimate that Ramada has about 50,000 rooms in the United States. Over three months, that's about 4.5 million room-nights possible. According to the above survey screener, 6.7 million adults have stayed in a Ramada room at some point in the past 3 months. Even with 2 adults per room, that's an amazing occupancy rate -- Monday through Sunday, every week of the past three months, if we are to believe this sample. I conclude that we cannot believe the sample. The duplicate bridge stat is interesting -- web panels skew younger, and bridge skews older. According to the ACBL, there are about 11 million people in the U.S. who play contract bridge. According to our screener, though, it's only 2.25 million -- under-reported by a factor of perhaps five.

Can they pass the test?

When showing respondents a description of a new product or service concept (sometimes even with an informative video clip), we've taken to the habit of giving the respondents a short, three-question "true or false" quiz about the concept they've just read about (and/or watched). These are not very difficult questions for a sentient, attentive person of even less-than-average IQ to answer. Consistently, we are finding that between 20% and 35% of respondents will fail this quiz that immediately follows presentation of the concept. My conclusion: perhaps a third of web survey respondents aren't paying any attention to the communications we're putting before them in surveys.

Akahele is presenting you data, both anecdotal and quantitative, each and every week. What conclusions are you drawing about the key theme of trust and the Internet? We look forward to your joining us with personal comments below.

Image credits

Comments

7 Responses to “Survey says…”

Kato
Interesting piece.
It has become pretty clear lately that internet polling is a sham, yet in the UK at least, vital policy discussions are still being guided by polling sites like YouGov, which are open to all kinds of manipulation.
This is another example, like Wikipedia, where reality does not match the touted claims. Snake oil salesmen are creaming massive profits by extolling the virtues of these flawed ventures.
Dan T.
I’m on some of those Internet survey panels myself; perhaps I even answered some of the surveys you commissioned (some of the questions above sound vaguely familiar). Sometimes the surveys ask weird stuff making me wonder just what the point of a survey is; your commentary gives me more background on that.
They can be pretty annoying with their repetitive questions; I’m sick of constantly getting asked my age, sex, zip code, and education level even though those are already on file in my record, and sometimes the same survey will ask those demographic questions more than once (it’s pretty common for a survey to ask my age at the beginning, then my birthdate at the end).
If a survey is too long (with lots and lots of questions about stuff I don’t give a flip about, like asking me a long series of questions of what I think of the difference between different brands of salty chips, their taste, their commercials, whether a particular brand gives “an impression of wholesomeness” or is one I “feel good about letting my kids eat” (I don’t actually have any kids), eventually I get to a point where I just want to get the darn thing over with so I’m not so careful in reading and answering the questions, perhaps producing some of the phenomena you see. On the other hand, I do often try to diligently answer questions even if it requires an annoying amount of digging through stuff like receipts that show, to the nearest dollar, how much I spent for my last tank of gas or printer ink cartridge (I’m fortunately enough of a packrat to usually have those receipts even a few weeks later when the survey is asked; I imagine most others, who threw away the receipt, just give the survey-takers a guesstimate off the top of their head.)
Am I breaking their rules where they keep reminding me that one condition of participating in their surveys is to never tell anybody else about what they ask in their surveys? (But then they keep sending me stuff branded with their name as bonus prizes, meaning that if I actually use it, people may notice that I’m a member of that survey panel and ask me about it.)
PJ
What a great discourse on the issue. In the face of how much data (and common sense) point to the likely invalidity of much of online poll research, the extent to which some people don’t really care about the validity of the data is disappointing. But in reality, the low cost and quicker execution are admittedly compelling incentives not to care. Your trap questions are a great way to try to separate the good from the bad and ugly.
RFK
I was about to say that I have been a participant in not just four, but five of the activities mentioned. But then I realized that you said ‘in the last 3 months’. Perhaps some responders were overlooking that requirement as well.
Please be advised that duplicate bridge is just one style of contract bridge. There are many contract bridge players who do not play duplicate bridge.
I participate in online surveys to rate my latest restaurant meal. I dare say that I have not been honest by saying a manager stopped by my table when, in fact, a manager was nowhere in sight.
Gregory Kohs
@Dan: I suppose you are breaking rules about non-disclosure, but (like the GFDL license and Wikipedia) I have to also suppose that very few entities who issue content under such terms actually expect that the terms will be followed to the letter by everyone subject to the terms.
@RFK: What are you, some kind of bridge director or something?
RFK
There is always room for humor – even on AKAHELE. I don’t have many answers but I enjoy browsing and searching. Count me as a regular AKAHELE reader.
Sarge
I am not an active internet survey participant, but had to laugh a little at myself while reading this, because I do have a bread-making machine in my home. It was given to me by my somewhat senile grandmother a few years back as a housewarming gift. I certainly do not see myself as the sort who would fit the demographic of a stereotypical bread-making machine owner (if there is even such a thing), but if I ever did run across that question on a survey, I would have to answer it honestly!
Very well written. I thoroughly enjoy all the content on Akahele thus far, I am glad to have stumbled onto this site, it has been refreshing and thought provoking.