Dr. QuantMaster MCP Server

03_REGRESSION.md•4.91 KiB

# Regression Analysis Mastery ## OLS Regression ### Model ``` Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε ``` ### Gauss-Markov Assumptions (BLUE) 1. **Linearity**: E(Y|X) = Xβ 2. **Random Sampling**: (Yᵢ, Xᵢ) are i.i.d. 3. **No Perfect Multicollinearity**: rank(X) = k+1 4. **Zero Conditional Mean**: E(ε|X) = 0 5. **Homoscedasticity**: Var(ε|X) = σ² 6. **Normality** (for inference): ε ~ N(0, σ²) ### Diagnostics #### Multicollinearity ```r # R car::vif(model) # VIF > 10 problematic ``` ```stata * Stata vif ``` #### Heteroscedasticity ```r # Breusch-Pagan test lmtest::bptest(model) # Solution: Robust SE lmtest::coeftest(model, vcov = sandwich::vcovHC) ``` ```stata hettest reg y x1 x2, robust ``` #### Normality of Residuals ```r shapiro.test(residuals(model)) qqnorm(residuals(model)) ``` ### Interpretation | Variable Type | Coefficient Interpretation | |--------------|---------------------------| | Continuous | 1-unit ↑ X → β-unit Δ Y | | Log(X) | 1% ↑ X → β/100 Δ Y | | Log(Y) | 1-unit ↑ X → 100×(eᵝ-1)% Δ Y | | Log-Log | 1% ↑ X → β% Δ Y (elasticity) | | Binary | Category=1 vs 0 difference | --- ## Panel Data Models ### Fixed Effects (FE) ``` Yᵢₜ = αᵢ + Xᵢₜβ + εᵢₜ Within transformation: (Yᵢₜ - Ȳᵢ) = (Xᵢₜ - X̄ᵢ)β + (εᵢₜ - ε̄ᵢ) ``` **Pros**: Controls for time-invariant unobservables **Cons**: Cannot estimate time-invariant variables ### Random Effects (RE) ``` Yᵢₜ = α + Xᵢₜβ + uᵢ + εᵢₜ where uᵢ ~ N(0, σ²ᵤ), εᵢₜ ~ N(0, σ²ₑ) ``` **Pros**: More efficient, can estimate time-invariant variables **Cons**: Requires uᵢ ⊥ Xᵢₜ (often implausible) ### Hausman Test ``` H₀: RE is consistent (uᵢ ⊥ Xᵢₜ) H₁: RE is inconsistent → Use FE Test: H = (β̂_FE - β̂_RE)'[Var(β̂_FE) - Var(β̂_RE)]⁻¹(β̂_FE - β̂_RE) ~ χ²(k) ``` ### Implementation ```r library(plm) fe <- plm(y ~ x1 + x2, data = panel, model = "within", index = c("id", "time")) re <- plm(y ~ x1 + x2, data = panel, model = "random", index = c("id", "time")) phtest(fe, re) # Hausman test ``` ```stata xtset id time xtreg y x1 x2, fe estimates store fe xtreg y x1 x2, re hausman fe ``` ### Two-Way Fixed Effects ```r library(fixest) twfe <- feols(y ~ x1 + x2 | id + time, data = panel, vcov = ~id) ``` ```stata reghdfe y x1 x2, absorb(id time) cluster(id) ``` --- ## Limited Dependent Variables ### Logistic Regression ``` P(Y=1|X) = 1 / (1 + e^(-Xβ)) = Λ(Xβ) Log-odds: log[P/(1-P)] = Xβ ``` ### Probit ``` P(Y=1|X) = Φ(Xβ) where Φ is the standard normal CDF ``` ### Interpretation - **Coefficients**: Log-odds (logit) or z-score change (probit) - **Odds Ratio**: exp(β) for logit - **Marginal Effects**: ∂P/∂X = f(Xβ)·β ```r # Logit logit <- glm(y ~ x1 + x2, data = df, family = binomial(link = "logit")) exp(coef(logit)) # Odds ratios # Marginal effects library(margins) margins(logit) ``` ```stata logit y x1 x2 margins, dydx(*) # Average marginal effects ``` ### Ordered Logit/Probit ```r library(MASS) polr(factor(y) ~ x1 + x2, data = df, method = "logistic") ``` ```stata ologit y x1 x2 ``` ### Multinomial Logit ```r library(nnet) multinom(y ~ x1 + x2, data = df) ``` ```stata mlogit y x1 x2, baseoutcome(0) ``` --- ## Count Models ### Poisson Regression ``` E(Y|X) = exp(Xβ) Var(Y|X) = E(Y|X) [equidispersion] log E(Y|X) = Xβ ``` ### Negative Binomial ``` Var(Y|X) = E(Y|X) + αE(Y|X)² [overdispersion] ``` ### Zero-Inflated Models - Excess zeros beyond what Poisson/NB predicts - Two-part model: logit for zero vs positive, count for positive values ### Implementation ```r # Poisson pois <- glm(y ~ x1 + x2, data = df, family = poisson) # Negative Binomial library(MASS) nb <- glm.nb(y ~ x1 + x2, data = df) # Zero-Inflated Poisson library(pscl) zip <- zeroinfl(y ~ x1 + x2 | z1, data = df) ``` ```stata poisson y x1 x2 nbreg y x1 x2 zip y x1 x2, inflate(z1) ``` --- ## Survival Analysis ### Hazard Function ``` h(t) = lim[Δt→0] P(t ≤ T < t+Δt | T ≥ t) / Δt ``` ### Cox Proportional Hazards ``` h(t|X) = h₀(t) · exp(Xβ) Hazard Ratio: HR = exp(β) ``` ### Kaplan-Meier Estimator ```r library(survival) km <- survfit(Surv(time, event) ~ group, data = df) plot(km) # Cox model cox <- coxph(Surv(time, event) ~ x1 + x2, data = df) ``` ```stata stset time, failure(event) sts graph, by(group) stcox x1 x2 ``` --- ## Model Selection ### Information Criteria ``` AIC = -2·ln(L) + 2k BIC = -2·ln(L) + k·ln(n) ``` - Lower is better - BIC penalizes complexity more ### R-squared Variants | Measure | Formula | Use | |---------|---------|-----| | R² | 1 - SSR/SST | OLS | | Adjusted R² | 1 - (1-R²)(n-1)/(n-k-1) | Model comparison | | Pseudo R² | 1 - L₁/L₀ | GLM | | Within R² | 1 - SSR_within/SST_within | Panel FE | ### Cross-Validation ```r library(caret) train_control <- trainControl(method = "cv", number = 10) cv_model <- train(y ~ ., data = df, method = "lm", trControl = train_control) ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/seanshin0214/quantmaster-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

03_REGRESSION.md•4.91 KiB