Archived posting to the Leica Users Group, 1997/09/04

[Author Prev] [Author Next] [Thread Prev] [Thread Next] [Author Index] [Topic Index] [Home] [Search]

Subject: M6 problem survey
From: "Patrick G. Sobalvarro" <pgs@sobalvarro.org>
Date: Thu, 04 Sep 1997 00:27:04 -0700

Someone requested that a statistician work out the confidence intervals for
Fernando's M6 problem survey.  I'm not a statistician, but I play one on
TV.  Actually, I often do this kind of thing as part of my job, so I guess
I can do a little analysis on Fernando's numbers for the group.

Before starting, let me say as I said before (and several other people have
said) that Fernando's statistical methods are unsound.  They are bad, bad,
bad.  First, he has surveyed a group that may not be representative.  Yes,
it is possible that LUG members are not representative of most M6 owners
with respect to camera problems -- for example, we could conceivably be too
enthusiastic, causing dealers to take advantage of us by selling us the few
broken M6's they have, saving the majority of working ones for people who
are less easily gulled.  Second, he has invited his survey participants to
select themselves, based on how interested they are in his survey.  This
will produce biased results.

This is not to say that Fernando is bad.  I like Fernando, and I like his
messages to the LUG.  I generally think he's wonderful.  But statistics...
tricky stuff.  Best left to professionals.

Okay, some error bounds on Fernando's numbers.  The usual method for small
sample sizes makes use of Student's t-distribution, and it assumes that we
have a random sample, because correcting for bias in the sample would be
hard without more information.  Therefore, the confidence intervals will be
wrong.  They will only be wrong because of the bias problem described in
the first part of this message.  If we actually had a random sample, the
confidence intervals would be correct.

First, the mean 6/26 (about 23%) for the probability that an M6 has a
problem.  The size of the 90% confidence interval is (0.087, 0.37), so that
we would say with 90% confidence that the probability that a
randomly-selected M6 has a problem is somewhere between about 9% and 37% --
if Fernando's sample is unbiased, which I claim it is not.  People who are
upset about something tend to select for themselves.

A similar calculation can be done for only the 95-96 M6's, where the sample
size is much lower.  Here the numbers are (0.205, 0.718) -- we would say
(if we believed the sample to be unbiased) with 90% confidence that the
probability that an M6 made in 95-96 had a problem was between about 21%
and 72%.  But, once again, because of the non-random sample, this interval
is probably wrong.

To demonstrate the absurdity of trying to estimate anything from Fernando's
survey data, I would point out that although the M6 has been in production
for 13 years, half of all M6's in his survey were made in 1995 and later.
If we used the logic I've seen advanced by lots of LUG members in this
discussion, we would say that half of all M6's in the world were made in 95
and 96, which is clearly absurd.  If we were to assume that Fernando's
sample is random, we would conclude with 90% certainty that the probability
that a randomly selected M6 was made in 95-96 is between 33% and 67%, but
our certainty would obviously be misplaced.

So, in conclusion, if you want to do this kind of thing, you need to study
up.  Buy a book on conducting surveys and another book on statistical
methods, and read them and pay attention.  Otherwise you produce misleading
results that could be quoted widely, increasing the amount of
misinformation in the world.

- -Patrick