Causal inference
The fundamental problem of causal inference is that only one potential outcome can ever be observed, unless one can go back in time and change the treatment or it is possible to find the identical subjects (e.g., in Physics or Chemistry).
Randomized controlled trial is a nice way to statistically overcome this limitation by randomly assigning people (on average the control and treatment group should be statistically equivalent with ). But it is difficult to do in many cases and it is hard to scale up.
On the other hand, Causal inference with observational data can potentially leverage huge, sometimes population-level datasets. At the same time, it is difficult to establish strong causal relationships due to the unobserved Confounding factors. See Homophily and influence, particularly Cosma Shalizi‘s paper Shalizi2011homophily.
Big data can be a potential solution. Having rich data may provide more opportunities to measure hidden confounders (e.g., see Keith2020text regarding the usage of text data). However, big data does not automatically address the fundamental challenges (Prosperi2020causal).
Learning materials
Tutorials
- An introduction to causal inference by George Berry
- Regression, Fire, and Dangerous Things
- An Introduction to Causal Inference by Judea Pearl
Talks
- Web Science Meets Network Science Workshop by Sinan Aral
- Causal inference for observational studies by Uri Shalit and David Sontag
Courses
- Gary King‘s course is on youtube
- The lecture accompanying The Effect: https://www.youtube.com/playlist?list=PLcTBLulJV_AK1hKtnO0-kYrU0D09K-kj8
- Mastering Mostly Harmless Econometrics
Methods
- Difference in differences
- Opiates for the Matches: Matching Methods for Causal Inference
- Using Text Embeddings for Causal Inference
- CausalImpact
- Proximal causal learning
- Knox2022testing