BHStat

August 31, 1999

We started this part of the course asking a question:

How small can randomly chosen samples be so that we are still satisfied that samples of this size provide an acceptable representation of the population?

We looked at samples from a population of students who chose their favorite singers from a list, and then we made an "eyeball comparison" of the percents from the samples with the percents from the population.

Omega came up with this observation:

*When we compare a population percent (such as the percent of the
population who are Dave Matthews fans) with the equivalent sample percent
(such as the percent of a sample drawn from that population who are
Dave Matthews fans), larger samples tend to be more accurate than smaller
samples.*

We gave a very special meaning to the phrase "tend to be more accurate."
We said it means that *the percents calculated from larger samples tend to
vary less, over the long run, from the actual population percent than do the
percents calculated from smaller samples.*

In class today, we devised ways to compare "variation over the long run" for samples of various sizes.

Your assignment is to write a short essay that responds to this question:

*We are going to take a sample from a population having approximately
40,000 individuals in it. How small can the sample be so that, over
the long run, samples of this size accurately reflect the population’s
composition?*

Your response is important, but the major part of this assignment is that *you
justify your response.* Your justification must be based on the
data presented in the attached sheets.

In your justification, *you should make clear that there are two competing
motives. One motive is that we want the sample to be very small. This will make
it easier to actually collect it. The other is that we want the sample size
to be such that there is relatively little variation, amongst samples of this
size, from the actual population percents. So, your justification should make
explicit reference to variation among samples of sizes larger than the one you
picked, and variation among samples of sizes smaller than the one you picked.*