Histograms & Frequency Polygons — concept & worked examples
A histogram groups continuous data into adjacent bins. The vertical axis shows frequency (counts). A frequency polygon connects bin midpoints with straight lines to emphasise shape.
Key ideas (brief)
Bin width controls smoothness: small width → noisy, large width → over-smooth.
Sturges' rule: k ≈ 1 + log2(n) — a quick starting point for number of bins.
Histograms show distribution shape: symmetric, skewed, multimodal, or uniform.
Worked example (step-by-step)
Data: 45,52,47,60,62,58,50,49,71,66,53,57
n = 12 → Sturges k ≈ 1 + log2(12) ≈ 5 (choose 5 bins)
Min = 45, Max = 71, range = 26, bin width ≈ 26 / 5 = 5.2 → round to 6
Bins: [45–50),[51–56),[57–62),[63–68),[69–74)
Count frequencies: e.g. 45–50 → {45,47,49,50} → 4
Solution details
Quick quiz
Which bin will contain value 57 from the example? (A) 45–50 (B) 51–56 (C) 57–62
True/False: A frequency polygon uses bin midpoints.
Regression analysis — scatter, least-squares, r & R²
Linear regression models the relationship between X and Y with a line y = mx + c. The best-fit (least-squares) line minimises the sum of squared vertical distances.
Important formulas (simple)
Slope: m = Σ(xi−x̄)(yi−ȳ) / Σ(xi−x̄)² — Intercept: c = ȳ − m x̄. Correlation r = covariance/(sx·sy), R² = r².
Quick quiz
What does R² = 0.64 mean? (brief)
True/False: r = 0 implies no linear relationship.
Time Series & Moving Averages
Moving averages smooth short-term fluctuations; a w-period simple moving average at time t is the mean of the w most recent observations.
Quick quiz
For the example, the first 3-month MA is what? (A) 25 (B) 24.3 (C) 26
True/False: Increasing window size reduces noise but lags more.
Mean, Median & Mode — how to choose and worked examples
Mean is the arithmetic average, median is the middle value, and mode is the most frequent value. Median is robust to outliers; mean uses all data and is useful for further calculations (variance).
Tip: For skewed distributions (income, house prices) use median. For symmetric distributions use mean.
Quick quiz
Which is resistant to outliers? (A) Mean (B) Median (C) Mode
True/False: Median is always equal to one of the data points.
Variance & Standard Deviation — intuition and worked example
Variance measures average squared deviations from the mean. Sample variance uses n−1 in the denominator (Bessel's correction). SD is the square root of variance — in same units as data.
Tip: Use population formulas only when you truly have every member; otherwise use sample formulas.
Quick quiz
If all values are equal, variance = ?
True/False: SD is measured in squared units.
Probability basics — counting, independence & simulation
Probability measures likelihood between 0 and 1. For equally-likely discrete outcomes: P = favourable / total. Independence: P(A∩B)=P(A)P(B) if independent.
Worked example: Probability of drawing an ace from a standard deck = 4/52 = 1/13 ≈ 0.0769.
Tip: Simulation helps build intuition — repeat experiments to see the law of large numbers in action.
Quick quiz
Probability of drawing a heart from a full deck? (A) 1/4 (B) 1/13 (C) 1/52
True/False: Probabilities can be negative.
Answers: 1) A (1/4). 2) False — probabilities range [0,1].
Advanced examples, tips & extra practice
Worked example — Histogram interpretation
We often want to compare shapes (e.g. skewness) and spot outliers. Below is a worked guide to interpret the histogram from the earlier example.
Check symmetry: are bars balanced left and right around the centre?
Identify skew: a long tail on the right = right-skewed (positive), on the left = left-skewed (negative).
Spot outliers: isolated bars far from others — consider investigating or removing for some analyses.
Practice
Using the example dataset (45,52,47,60,62,58,50,49,71,66,53,57), answer:
Is the distribution symmetric or skewed?
Which bin contains the highest frequency?
Regression — interpreting slope & intercept
Beyond fitting a line, interpretation matters: slope = expected change in Y per unit increase in X. Intercept is the predicted Y when X = 0 (sometimes not meaningful if X=0 outside observed range).
Worked tip
If slope = 2.5 and X measures hours studied, then on average an extra hour is associated with 2.5 more marks (assuming linear model is appropriate).
Mini practice
Suppose fitted line is y = 3.2x + 5. What is predicted y when x = 4?
Time series — choosing MA window
Window size trades smoothing vs responsiveness. Use small windows (2–3) to remove small noise; larger windows (7,12) for strongly seasonal data.
Short window → more responsive to changes but noisier.
Long window → smoother trend but may hide sudden shifts.
Extra practice problems (with answers hidden)
Compute mean & median for: 12, 15, 11, 14, 100.
Given X: 2,4,6 and Y: 3,5,7 compute slope of least squares line (hint: slope ≈ ?).
Calculate 3-point moving average for series: 8, 9, 10, 12, 11.
Probability: From a standard deck what is P(heart or queen)?
More Probability Topics
Conditional probability, tree diagrams, permutations & combinations — interactive examples and exam-style practice.
Conditional Probability
P(A | B) = P(A and B) / P(B)
Tip: If P(A ∩ B) > P(B) there is an input error — joint cannot exceed marginal.
Interactive Tree Diagram (2-level)
Enter probabilities for level-1 branches, then for each child. The diagram shows joint probabilities.