# 15 Probability and Sampling

Jenna Lehmann

**Probabilities**

A **probability** is a fraction or a **proportion** of all the possible outcomes. So it’s the number of classified outcomes classified as X divided by the total number of possible outcomes (N). It’s generally reported as a decimal, but it can also be reported as a fraction or a percentage.

What is the role of probability in populations, samples, and inferential statistics? As we discussed before, because it’s usually impossible for researchers to draw data from the entirety of a population, they draw samples. The size of the sample affects how comparable the sample population is to the general population. Probability is used to predict what kind of samples are likely to be obtained from a population. Thus, probability establishes a connection between samples and populations; we know from looking at the population how likely it is for a specific sample to be drawn. We also use proportions that exist within samples to infer the probabilities that exist within a population. Inferential statistics rely on this connection when they use sample data as the basis for making conclusions about populations.

**Random Sampling**

**Random sampling** is a process by which researchers pool together a sample in such a way that it is most likely to be representative of the population as a whole. While this will never be entirely the case – since (1) there is always a chance that a sample will be entirely different from the population and (2) samples inherently always have less variability than the population – it’s good practice to follow certain random sampling requirements:

**Independent random sampling**: Probabilities must stay constant from one selection to the next if more than one individual is selected. In other words, selecting one individual shouldn’t affect the probability of another person being selected; their chances are independent of one another.**Random sampling with replacement**: Each individual in the population has an equal chance of being selected, meaning that to keep the denominator of the probability equation (X/N) the same for each draw, the first draw needs to be returned to the population pool.

**Proportions in Frequency Distributions**

Proportions can be represented in frequency distributions, and this was briefly touched on in another blog post about z-scores. A selected section of a frequency distribution represents a proportion of the population; the selected area under the curve represents a proportion of the population. Because normal distributions are symmetrical and the same shape, just stretched out differently, we can use z-scores to standardize the scores and use a unit normal table to determine what proportion of the population is on either side of that score. The area under the curve literally becomes a proportion. We also know that in a normal distribution, more extreme scores are less likely to occur, since most scores will build up near the mean. The proportions of ranges of scores closer to the mean are greater than the proportions of scores in the ranges near the tails of the distribution.

*This chapter was originally posted to the Math Support Center blog at the University of Baltimore on on June 6, 2019. *