Sampling Surveys and How to Ask Questions
- The knowledge of how the data were generated is one of the key ingredients for translating data intelligently.
Some definitions:
Descriptive Statistics: Describing data at hand using numerical summaries (such as the mean, IQR, etc.) and graphical summaries (histograms, bar charts, etc.)
Inferential Statistics: Using sample information to make conclusions about a larger group of items/individuals than just those in the sample.
- Most statistical studies are about using a sample to make an inference about a population.
- Use sample statistics to estimate population parameter
- This only works if the data can be considered representative of the population.
- Easy way to do this: random sample.
- Hard way: Survey. Ask every person – a census.
Random sample
- Very accurate. If you use proper methods to sample 1500 people from a population of millions, you can almost certainly gauge the percentage of the entire population that have a trait to within 3%. You just have to use a proper sampling method.
Types of Bias
Bias: Method used inconsistently produces values either too high or too low.
- Selection bias: Selection of sample does not represent population
- Nonparticipation bias: Representative sample chosen, but subset cannot be contacted.
- Response bias: Participants respond differently from how they truly feel.
Margin of Error, Confidence Intervals
Sample Surveys used to estimate the proportion of people who have a certain trait or opinion \(p\). The proportion based on the sample is \(\hat{p}\). How close is \(\hat{p}\) to \(p\)? This is the margin of error:
- Conservative: \(\frac{1}{\sqrt{n}}\). About a 95% confidence interval for \(p\).
Response sample: responses should be independent and identically distributed (iid).
Two types of research studies:
- Observational studies: observe the opinions of the participants, or their behaviors or outcomes.
Experiments: The researchers manipulate something and measure the manipulation on some outcome of interest.
Confounding variable: affects response variable AND related to explanatory variable. Might be measured and accounted for, or unmeasured lurking variables