Regression is one of the most powerful and widely used statistical methods—it allows you to predict one variable from one or more others and understand relationships between variables. Linear regression predicts a continuous outcome (test scores, salary, depression severity) from predictors. Logistic regression predicts a binary outcome (pass/fail, yes/no, diagnosed/not diagnosed). Both are foundational for graduate-level research. Yet regression confuses many students: they understand the concept but struggle with choosing between linear and logistic, interpreting output correctly, checking assumptions, and reporting results properly. Regression analysis help covers both models, when to use each, how to check assumptions, how to interpret the massive output SPSS produces, and how to report findings in APA format with proper statistical notation. This guide covers regression fundamentals, how to choose your model, interpretation principles, and common mistakes.
Linear regression
When to use
- Predicting a continuous outcome: Test scores (0–100), salary ($0–$200K), depression severity (0–60 scale)
- Understanding relationships: How much does each predictor contribute to the outcome?
- Simple (bivariate): One predictor (e.g., study hours → test score)
- Multiple: Several predictors (e.g., study hours + class attendance + prior GPA → test score)
Key concepts
- Intercept (constant): Predicted value when all predictors = 0. Sometimes meaningful, sometimes not (can't have negative study hours)
- Slope (B): How much the outcome changes per one-unit increase in the predictor. "For each additional study hour, test score increases 2.5 points"
- Standardized slope (Beta): Slope in standard deviation units; allows comparison across predictors with different scales. "Study hours (β = .45) is a stronger predictor than class attendance (β = .20)"
- R-squared (R²): Proportion of variance in the outcome explained by predictors. R² = .35 means predictors explain 35% of outcome variability; 65% is explained by other factors
- p-value (significance): Is the slope different from zero? p < .05 = significant predictor
Assumptions
- Linearity: Relationship between predictor and outcome is linear (straight line), not curved
- Independence: Observations are independent (one person's score doesn't influence another's)
- Homoscedasticity: Variance of residuals (errors) is constant across levels of predictors. Check scatterplot of predicted vs. residual values—should show random scatter, not a fan shape
- Normality of residuals: Errors are normally distributed (Q-Q plot should show points along a line)
- No multicollinearity: Predictors don't correlate too highly with each other. Check VIF (Variance Inflation Factor); VIF > 10 indicates problem
APA reporting
- Simple regression: "Study hours significantly predicted test scores, β = .45, t(98) = 4.67, p < .001, R² = .18."
- Multiple regression: Report R² for overall model; then for each predictor: "Study hours (β = .45, p < .001) was a significant predictor, but class attendance (β = .12, p = .18) was not."
- Table format: Variables, B (unstandardized slope), SE (standard error), β (standardized slope), t-statistic, p-value
Logistic regression
When to use
- Predicting a binary (yes/no) outcome: Pass/fail, hired/not hired, diagnosed with condition/not diagnosed
- Understanding predictors: Which factors increase odds of the outcome?
- Why not linear regression for binary outcomes? Linear regression produces predictions outside 0–1 range (impossible for probabilities); violates assumptions
Key concepts
- Odds: Probability of outcome / probability of not having outcome. Odds = .75 means outcome is 3x as likely as not happening
- Odds ratio (exp(B)): Change in odds per one-unit increase in predictor. OR = 2.0 means odds double per unit increase
- Probability: Predicted likelihood of outcome (0–1). Always report alongside odds ratio for clarity
- Nagelkerke R²: Pseudo-R² (not true variance explained, but indicates model fit). Values 0–1
- Classification accuracy: What percentage of cases did the model predict correctly? Important for evaluating practical utility
Assumptions
- No perfect separation: Outcome variable should not be perfectly predicted by a single predictor (model can't estimate)
- Independence: Observations are independent
- Linearity (in the logit): Relationship between log-odds and predictors is linear (check via Box-Tidwell test)
- No multicollinearity: Predictors not highly correlated
APA reporting
- Example: "High stress level significantly predicted depression diagnosis (OR = 2.34, 95% CI [1.45, 3.78], p < .001). The odds of depression increased 2.34 times for each unit increase in stress."
- Include: Odds ratio, 95% confidence interval, p-value; optionally predicted probability for meaningful values
Linear vs. logistic: choosing your model
| Characteristic | Linear Regression | Logistic Regression |
| Outcome variable | Continuous (0–100, -10 to +10, any range) | Binary/Categorical (0/1, Yes/No, Pass/Fail) |
| Prediction output | Predicted value in original units (e.g., "test score = 75") | Probability of outcome (0–1); odds; odds ratio |
| Effect size measure | R² (variance explained); slope (units of change) | Odds ratio; probability change |
| Assumptions | Linearity, homoscedasticity, normality of residuals | Linearity in logit, no perfect separation |
| Interpretation | "For each unit increase in X, Y increases β units" | "For each unit increase in X, odds of outcome multiply by exp(B)" |
Regression analysis checklist
- ☐ Outcome and predictors identified (and outcome is continuous for linear, binary for logistic)
- ☐ Sample size adequate (10–20 observations per predictor minimum)
- ☐ Assumptions checked (linearity, homoscedasticity, normality, no multicollinearity)
- ☐ Correct model selected (linear vs. logistic)
- ☐ Regression run and output examined
- ☐ Model fit evaluated (R² for linear; classification accuracy for logistic)
- ☐ Statistical significance of predictors noted (p-values)
- ☐ Effect sizes reported (slopes/odds ratios with CIs)
- ☐ Results interpreted in context of research question
- ☐ APA formatting correct
Get regression analysis help
Linear and logistic regression clarified. From model selection through interpretation and APA reporting, we help you leverage regression's power in your research.
Order regression analysis helpFAQ
No. Linear regression produces predictions outside 0–1 and violates assumptions. Use logistic regression for binary outcomes. If you must use linear (rare cases), justify this choice and acknowledge limitations
If OR = 2.5, the odds of the outcome multiply by 2.5 per unit increase in the predictor. "For each year of education, odds of employment increase 2.5 times." Always include the 95% CI around the OR
For linear: R² > .30 is good for social sciences; check residual plots for violations. For logistic: classification accuracy > baseline (better than guessing), examine sensitivity/specificity. But "good" depends on context—sometimes explaining 15% of variance is meaningful
Yes, but it's called "multinomial logistic regression" and is more complex. For beginners, stick with binary outcomes or collapse categories. For multicategory outcomes, ordinal regression may be appropriate if outcome is ordinal