STATS 250

Sampling Surveys and How to Ask Questions

  • The knowledge of how the data were generated is one of the key ingredients for translating data intelligently.

Some definitions:

Descriptive Statistics: Describing data at hand using numerical summaries (such as the mean, IQR, etc.) and graphical summaries (histograms, bar charts, etc.)

Inferential Statistics: Using sample information to make conclusions about a larger group of items/individuals than just those in the sample.

  • Most statistical studies are about using a sample to make an inference about a population.
  • Use sample statistics to estimate population parameter
  • This only works if the data can be considered representative of the population.
    • Easy way to do this: random sample.
    • Hard way: Survey. Ask every person – a census.

Random sample

  • Very accurate. If you use proper methods to sample 1500 people from a population of millions, you can almost certainly gauge the percentage of the entire population that have a trait to within 3%. You just have to use a proper sampling method.

Types of Bias

Bias: Method used inconsistently produces values either too high or too low.

  • Selection bias: Selection of sample does not represent population
  • Nonparticipation bias: Representative sample chosen, but subset cannot be contacted.
  • Response bias: Participants respond differently from how they truly feel.

Margin of Error, Confidence Intervals

Sample Surveys used to estimate the proportion of people who have a certain trait or opinion \(p\). The proportion based on the sample is \(\hat{p}\). How close is \(\hat{p}\) to \(p\)? This is the margin of error:

  • Conservative: \(\frac{1}{\sqrt{n}}\). About a 95% confidence interval for \(p\).

Response sample: responses should be independent and identically distributed (iid).

Two types of research studies:

  • Observational studies: observe the opinions of the participants, or their behaviors or outcomes.
  • Experiments: The researchers manipulate something and measure the manipulation on some outcome of interest.

  • Confounding variable: affects response variable AND related to explanatory variable. Might be measured and accounted for, or unmeasured lurking variables