Oregon State University
Tuesday, December 3, 2024
Common Causes of Missing Data
Why It Matters
Types of Missing Data
MICE assumes data is MAR, making it crucial to identify the missingness type to apply the right method.
MICE = Multiple Imputation by Chained Equations
For each variable with missing data X_j:
X_j^{(\text{missing})} = f_j(X_{-j}) + \varepsilon_j
(van Buuren, 2018)
Handling missing data with MICE is like trying to complete a puzzle with missing pieces. Instead of guessing one fixed piece to fill a gap, you evaluate several options that could reasonably fit, based on the surrounding picture. Each plausible “piece” represents a potential imputation, reflecting the uncertainty of the missing data.
MICE models each variable conditional on others, preserving multivariate relationships and producing statistically valid imputations.
Goal: Assess the impact of missing data patterns on regression analysis and demonstrate how MICE recovers accurate results.
Dataset: Product Sales and Returns
Problem: Missing values simulated in Refunds column.
Steps:
1. Data Prep: Selected Refunds, Purchased Item Count, Total Revenue, Category.
2. Simulating Missingness: Applied patterns (MCAR, MAR, MNAR) at 10–70% levels.
3. MICE Imputation: Used mice package with predictors to fill missing Refunds.
4. Regression Analysis: Evaluated the effect of missing patterns and imputation on results.
Hence, understanding missing data types is crucial for selecting appropriate imputation methods.
(van Buuren, 2018)
Low Bias and High Coverage: Indicates randomization-valid methods.
Efficiency: Shorter confidence intervals (AW) are better if coverage (CR) is adequate.
Multiple Imputation is powerful but not always needed.
Complete-Case Analysis (also known as Listwise Deletion):
When is Complete-Case Analysis Appropriate?
Be Careful:
(van Buuren, 2018)
ST 541: Probability, Simulation & Computation