Matching and weighting
Propensity score
Propensity score
(Rosenbaum and Rubin, 1983)
- in observational studies, conditioning on propensity scores can lead to unbiased estimates of the exposure effect
- given that there are no unmeasured confounders
- every subject has a non-zero probability of receiving exposure
Fit a logistic regression:
- binary outcome: exposure (1,0)
- covariates: all but exposure
Predict the values (probability), they are the propensity scores.
PS can also be estimated using other methods that produce probabilities, not just logistic regression: random forest, lasso logistic regression etc.
Matching
This overview summary is based on the review paper Stuart 2010: Matching methods for causal inference: a review and a look forward
Goal of matching: choosing well-matched samples of the original groups to reduce confounding - acquire treatment and control groups with similar covariate distributions.
Alternatives to matching: adjust for covariates in a regression model, instrumental variables, structural equation modeling etc.
Benefits of matching:
- complementary to regression adjustment, can and should be used together
- have straightforward diagnostics, hence performance can be assessed
Two settings to use matching:
- when outcome values are NOT available, and matching is used to select subjects to follow up
- outcome values are available, matching is to reduce bias in treatment effect estimation
History of matching methods
- they’ve been used in 1940s, but a theoretical basis was not developed until 1970s (Rubin et al)
- with multiple covariates it is difficult to find exact matches on even small number of covariates (Chapin 1947). The introduction of propensity score (1983, Rosenbaum and Rubin) helped.
Comments
- The outcomes are usually not used even when they are available
- PS matching is ONE of the many matching techniques that uses PS as the difference
- Matching would reduce number of observations, hence loss of power
- matching gives estimates for ATT
Steps to implement matching methods
- Define closeness: distance measure used to determine whether an individual is a good match for another
- implement a matching method
- assess the quality of a matching method. Might require iteration of step 1 and 2
- analyse the outcome given the matched data
Vignette: Estimating effects after matching by Noah Greifer
Exact matching:
- perfect covariate balance; \(F(X_i|T_i = 1) = F(X_i|T_i=0)\)
- infeasible when covariate is continuous, and when there are many covariates.
Probability of receiving treatment, \(\pi(X_i) = P(T_i = 1 | X_i)\)
Matching based on distance measures
- Mahalanobis distance
- Estimated propensity score, \(D(X_i, X_j) = |P(T_i = 1|X_i) - P(T_j=1 | X_j)|\)
Check covariate balance
- ideally compare joint distribution of all covariates
- practically check lower-dimensional summaries (standardized mean difference, variance ratio, empirical CDF difference)
Balance test
Weighting
Weighting can be viewed as a generalization of matching. Weighting is commonly used for estimating ATE.
Weights used in weighting:
ATE (population) \(w_{ATE} = \frac{Z_i}{p_i} + \frac{1-Z_i}{1-p_i}\)
ATT \(w_{ATT} = \frac{p_i Z_i}{p_i} + \frac{p_i(1-Z_i)}{1-p_i}\)
ATC \(w_{ATC} = \frac{(1 - p_i) Z_i}{p_i} + \frac{(1 - p_i)(1-Z_i)}{1-p_i}\)