It's all regression - StatPREP Workshop

We almost always work with variables that are either …

In relating two variables, there are four possible situations

	Quantitative	Categorical

Quant.	slope of regression line or correlation coefficient	difference in two means
………….	……………………………………	……………………………………….
Categ.	??????	difference in two proportions

Presentation in terms of algebra

	Quantitative	Categorical

Quant.	$b_{1} = \frac{n \sum x y - \sum x \cdot \sum y}{n \sum x^{2} - (\sum x)^{2}}$	$t = \frac{{\bar{x}}_{1} - {\bar{x}}_{2} - (μ_{1} - μ_{2})}{\sqrt{\frac{s_{p}^{2}}{n_{1}} + \frac{s_{p}^{2}}{n_{2}}}}$
………….	……………………………………	……………………………………….
Categ.	??????	$z = \frac{(\hat{p_{1}} - \hat{p_{2}}) - (p_{1} - p_{2})}{\sqrt{\frac{\bar{p} \bar{q}}{n_{1}} + \frac{\bar{p} \bar{q}}{n_{2}}}}$

… and associated distributions: t and z

The error bars show the size of the standard deviation.

knitr::include_graphics("/images/PQQ.png")

Measure the effect size. Call it $β$ :
- slope for Quant vs Quant or Cat vs Quant
- difference for Quant vs Cat or Cat vs Cat
Take the ratio of the model-value standard deviation to the raw-value standard deviation. Call this ratio $R$ .

Compute the ratio of “explained” to “unexplained”: $F = (n - 1) \frac{R^{2}}{1 - R^{2}}$

Eyeballing from the four models (which had n = 200)…

Freebies:

correlation coefficient $r = \sqrt{F}$ , where the $\pm$ branch is based on the slope.
Prefer t? It’s $t = \sqrt{F}$ . But F is more general than t.