Regression Analysis Help: Linear & Logistic Regression

Regression is one of the most powerful and widely used statistical methods. It allows you to predict one variable from one or more others and understand relationships between variables. Linear regression predicts a continuous outcome (test scores, salary, depression severity) from predictors. Logistic regression predicts a binary outcome (pass/fail, yes/no, diagnosed/not diagnosed). Both are foundational for graduate-level research. Yet regression confuses many students: they understand the concept but struggle with choosing between linear and logistic, interpreting output correctly, checking assumptions, and reporting results properly. Regression analysis help covers both models, when to use each, how to check assumptions, how to interpret the massive output SPSS produces, and how to report findings in APA format with proper statistical notation. This guide covers regression fundamentals, how to choose your model, interpretation principles, and common mistakes.

Linear regression

When to use

Predicting a continuous outcome: Test scores (0–100), salary ($0–$200K), depression severity (0–60 scale)
Understanding relationships: How much does each predictor contribute to the outcome?
Simple (bivariate): One predictor (e.g., study hours → test score)
Multiple: Several predictors (e.g., study hours + class attendance + prior GPA → test score)

Key concepts

Intercept (constant): Predicted value when all predictors = 0. Sometimes meaningful, sometimes not (can't have negative study hours)
Slope (B): How much the outcome changes per one-unit increase in the predictor. "For each additional study hour, test score increases 2.5 points"
Standardized slope (Beta): Slope in standard deviation units; allows comparison across predictors with different scales. "Study hours (β = .45) is a stronger predictor than class attendance (β = .20)"
R-squared (R²): Proportion of variance in the outcome explained by predictors. R² = .35 means predictors explain 35% of outcome variability; 65% is explained by other factors
p-value (significance): Is the slope different from zero? p < .05 = significant predictor

Assumptions

Linearity: Relationship between predictor and outcome is linear (straight line), not curved
Independence: Observations are independent (one person's score doesn't influence another's)
Homoscedasticity: Variance of residuals (errors) is constant across levels of predictors. Check scatterplot of predicted vs. residual values. Should show random scatter, not a fan shape
Normality of residuals: Errors are normally distributed (Q-Q plot should show points along a line)
No multicollinearity: Predictors don't correlate too highly with each other. Check VIF (Variance Inflation Factor); VIF > 10 indicates problem

APA reporting

Simple regression: "Study hours significantly predicted test scores, β = .45, t(98) = 4.67, p < .001, R² = .18."
Multiple regression: Report R² for overall model; then for each predictor: "Study hours (β = .45, p < .001) was a significant predictor, but class attendance (β = .12, p = .18) was not."
Table format: Variables, B (unstandardized slope), SE (standard error), β (standardized slope), t-statistic, p-value

Logistic regression

When to use

Predicting a binary (yes/no) outcome: Pass/fail, hired/not hired, diagnosed with condition/not diagnosed
Understanding predictors: Which factors increase odds of the outcome?
Why not linear regression for binary outcomes? Linear regression produces predictions outside 0–1 range (impossible for probabilities); violates assumptions

Key concepts

Odds: Probability of outcome / probability of not having outcome. Odds = .75 means outcome is 3x as likely as not happening
Odds ratio (exp(B)): Change in odds per one-unit increase in predictor. OR = 2.0 means odds double per unit increase
Probability: Predicted likelihood of outcome (0–1). Always report alongside odds ratio for clarity
Nagelkerke R²: Pseudo-R² (not true variance explained, but indicates model fit). Values 0–1
Classification accuracy: What percentage of cases did the model predict correctly? Important for evaluating practical utility

Assumptions

No perfect separation: Outcome variable should not be perfectly predicted by a single predictor (model can't estimate)
Independence: Observations are independent
Linearity (in the logit): Relationship between log-odds and predictors is linear (check via Box-Tidwell test)
No multicollinearity: Predictors not highly correlated

APA reporting

Example: "High stress level significantly predicted depression diagnosis (OR = 2.34, 95% CI [1.45, 3.78], p < .001). The odds of depression increased 2.34 times for each unit increase in stress."
Include: Odds ratio, 95% confidence interval, p-value; optionally predicted probability for meaningful values

Linear vs. logistic: choosing your model

Characteristic	Linear Regression	Logistic Regression
Outcome variable	Continuous (0–100, -10 to +10, any range)	Binary/Categorical (0/1, Yes/No, Pass/Fail)
Prediction output	Predicted value in original units (e.g., "test score = 75")	Probability of outcome (0–1); odds; odds ratio
Effect size measure	R² (variance explained); slope (units of change)	Odds ratio; probability change
Assumptions	Linearity, homoscedasticity, normality of residuals	Linearity in logit, no perfect separation
Interpretation	"For each unit increase in X, Y increases β units"	"For each unit increase in X, odds of outcome multiply by exp(B)"

Regression analysis checklist

☐ Outcome and predictors identified (and outcome is continuous for linear, binary for logistic)
☐ Sample size adequate (10–20 observations per predictor minimum)
☐ Assumptions checked (linearity, homoscedasticity, normality, no multicollinearity)
☐ Correct model selected (linear vs. logistic)
☐ Regression run and output examined
☐ Model fit evaluated (R² for linear; classification accuracy for logistic)
☐ Statistical significance of predictors noted (p-values)
☐ Effect sizes reported (slopes/odds ratios with CIs)
☐ Results interpreted in context of research question
☐ APA formatting correct

Get regression analysis help

Linear and logistic regression clarified. From model selection through interpretation and APA reporting, we help you leverage regression's power in your research.

Order regression analysis help

FAQ

Should I use linear regression for a binary outcome?

No. Linear regression produces predictions outside 0–1 and violates assumptions. Use logistic regression for binary outcomes. If you must use linear (rare cases), justify this choice and acknowledge limitations

What does the odds ratio mean?

If OR = 2.5, the odds of the outcome multiply by 2.5 per unit increase in the predictor. "For each year of education, odds of employment increase 2.5 times." Always include the 95% CI around the OR

How do I know if my regression model is good?

For linear: R² > .30 is good for social sciences; check residual plots for violations. For logistic: classification accuracy > baseline (better than guessing), examine sensitivity/specificity. But "good" depends on context. Sometimes explaining 15% of variance is meaningful

Can I use logistic regression with more than two outcome categories?

Yes, but it's called "multinomial logistic regression" and is more complex. For beginners, stick with binary outcomes or collapse categories. For multicategory outcomes, ordinal regression may be appropriate if outcome is ordinal