Causal Inference in Data Science
Observational data
Some topics that need to be reviewed
Categorise the problem quickly
- level of aggregation: individual, or aggregated counts/ country level summary statistic
- time component: temporal intervention of not
- control: is control available, or do we need to construct it
| Method | Time series | Control | Statistics or ML | R packages | Python libraries |
|---|---|---|---|---|---|
| Difference-in-Difference | no | statistical | did, plm |
statsmodels, econml, causalinference |
|
| Propensity score matching | no | statistical | MatchIt, twang |
pymatch, causalml, sklearn |
|
| Regression discontinuity design | no | statistical | rdd, rdrobust |
||
| Instrumental variable | no | statistical | ivreg, AER |
econml |
|
| Causal forests | no | ML | grf |
econml, causalml, sklearn |
|
| Bayesian structural time series | yes | statistical + ML | bsts |
causalimpact, orbit, pymc3 |
|
| Synthetic control | yes | statistical | Synth, tidysynth |
synthcontrol, PySynth |
These techniques have different use-cases. Focus on the aggregated time series ones first.
Non-TS usecases in business
IV, RD, PSM