What Variance? 1 easy Simple Guide

Variance is a measure of the spread or dispersion of a set of data points in a dataset. It quantifies how much individual values differ from the mean (average) of the data. A high variance indicates that the data points are spread out widely from the mean. A low variance suggests that they are close to the mean. Mathematically, variance is calculated as the average of the squared differences between each data point and the mean.

Variance is the mean of the squared deviations from the mean.

Instead of finding absolute deviation, we square the difference from the mean. We ignored negative signs obtained by subtracting the mean from values, but now we square the difference from the mean. If we had deviation d from the mean, we get another column for d².

Consider the following set of data:

42, 45, 39, 52, 48, 47, 50, 43, 36, 37

The mean for the data is

42 + 45 + 39 + 52 + 48 + 47 + 50 + 44 +

36 + 37 =440

The table below shows deviation from the mean of data representing some marks in a math test.

from the table above, the sum of d² is 268. That is Σd² = 268.00

the variance, sometimes written as σ² is given as Σd² divided by total frequency N. That is;

$$\sigma^2 = \frac{\sum d^2}{N}$$

In the example above:

$$\sigma^2 = \frac{268}{10} = 26.80$$

Table of Contents

Standard deviation σ

It is also known as the root mean square deviation. It is obtained by getting the square root of the variance. In the example above, The standard deviation, sometimes abbreviated as s.d will be given by:

$$\sigma = \sqrt{26.80} \simeq 5.1769$$

Standard Deviation For a grouped data

standard deviation is useful for a large data. Consider the table below that contains marks distribution for some 40 students.

We first calculate the mean for the data using the methods of getting mean for the grouped data.

Here i use the assumed mean of 5.5. I choose this value because it has little value of midpoint and frequency, hence not likely to loose a lot of accuracy when it is reduced to zero

Recall:

$$ Mean \ \overline x = \frac{\sum fx}{\sum f} = \frac{\sum fd}{\sum f} + A$$ $$\text{ but }$$

$$\frac{\sum fd}{\sum f} = \frac{\sum ft}{\sum f} \times 10$$

From the table above:

$$\frac{\sum ft}{\sum f} = 3.95$$

and

$$\frac{\sum ft}{\sum f} \times 10 = 39.5$$

and hence:

$$\frac{\sum fd}{\sum f} = 39.5$$

Therefore: Mean 39.5 + 5.5 = 45

Now that we have the mean, we will make a table that contain deviations from the mean. The table will also include the squared deviation from the mean, as shown.

The we will proceed to calculate the variance and get it’s square root to get the standard deviation.

$$ variance \ \sigma^2 = \frac{\sum fd^2}{\sum f} = \frac{15190.0}{40} = 379.75$$

$$\text{standard deviation} = \sqrt{variance} $$ $$=\sqrt{379.75} = 19.4872$$

Applications of Variance

Variance is a versatile and valuable tool that finds applications across multiple disciplines. It helps in quantifying uncertainty and guiding decision-making. This includes: assessing financial risk, improving manufacturing quality, or understanding medical treatment variability. some common applications are but not limited to:

Risk Assessment in Finance

Stock Market and Investment Analysis: Variance is widely used in finance to measure the risk or volatility of an investment. By calculating the variance of a stock’s returns, investors can assess the risk associated with an investment. A higher variance indicates that the stock’s returns are more unpredictable, meaning higher risk.
Portfolio Management: Variance helps in constructing a diversified portfolio. By analyzing the variance of individual assets and how they correlate with one another, portfolio managers can minimize overall risk.

2. Quality Control in Manufacturing

Process Improvement: In manufacturing and production, variance is used to monitor and maintain product quality. If the variance of product measurements (e.g., weight, size) is too high, it suggests that the production process is inconsistent, and corrective actions may be required.
Six Sigma: Variance is central to quality management techniques like Six Sigma, which aims to reduce defects by minimizing the variation in processes.

3. Educational Testing and Grading

Standardized Test Scores: Used to analyze test scores and other educational data. A low value in test scores suggests that students performed similarly. On the other hand, a high value indicates significant differences in performance. This can help educators suggests areas where students need more support.
Assessing Consistency: In grading systems, it can help in determining the consistency of an examiner’s scoring. A high variation score in scores might indicate inconsistencies in how assignments or exams are graded.

4. Medical Research and Clinical Trials

Effectiveness of Treatments: In clinical trials, it is used to assess the variability of treatment outcomes across different patients. A low value in treatment effects suggests that the treatment has a consistent impact. Conversely, a high number indicates that the treatment is effective for some patients but not for others.
Risk Factors: It can also be used in epidemiology to understand the distribution of risk factors in a population. It helps to identify groups at higher risk for certain conditions or diseases.

5. Sports Analytics

Player Performance: In sports statistics, It is used to measure the consistency of player performance over time. A high variance in a player’s statistics (e.g., points scored in basketball) suggests that their performance is unpredictable, while low value indicates consistent performance.
Game Outcomes: Variance can also be applied to analyze the unpredictability of game outcomes. It helps teams and analysts understand how much randomness or luck plays a role in the results.

6. Agricultural Studies

Crop Yield Analysis: In agriculture, it evaluates the consistency of crop yields. It does this across different plots or farming conditions. A high variance in crop yields might indicate the influence of external factors like weather or soil conditions.
Genetic Research: Helps in the study of genetic variation among plants and animals. Understanding the variance in traits (like growth rate or resistance to diseases) can guide breeding programs.

7. Behavioral and Social Sciences

Survey Data Analysis: used to examine the spread of responses in surveys or experiments. It helps researchers understand how much agreement or disagreement there is among respondents. This understanding can provide insights into public opinion or social behavior.
Psychological Testing: It is important in psychology for understanding individual differences in traits like intelligence, personality, or behavior. It allows researchers to measure how much variation exists within a population.

8. Machine Learning and Data Science

Feature Selection: can help in feature selection. Features (or variables) with low variance across a dataset not give much information for predictive modeling. These features can be discarded.
Model Evaluation: used in evaluating the performance of models. For example, in cross-validation, we examine the variance of performance scores across different folds. This variance helps in assessing how well the model generalizes to new, unseen data.

some problems involving Variance

workers in a factory commute from their homes to the factory. The table below shows the distances in kilometers covered by the workers.

The mean distance covered was 14.5 km.

(a) Determine the value of t and hence the standard deviation of the distance correct to 2 decimal places. (6 marks)

(b) calculate, correct to 2 decimal places, the interquartile range of the distances (4 marks).