Causal inference

The fundamental problem of causal inference is that only one potential outcome can ever be observed, unless one can go back in time and change the treatment or it is possible to find the identical subjects (e.g., in Physics or Chemistry).

Randomized controlled trial is a nice way to statistically overcome this limitation by randomly assigning people (on average the control and treatment group should be statistically equivalent with NN \rightarrow \infty). But it is difficult to do in many cases and it is hard to scale up.

On the other hand, Causal inference with observational data can potentially leverage huge, sometimes population-level datasets. At the same time, it is difficult to establish strong causal relationships due to the unobserved Confounding factors. See Homophily and influence, particularly Cosma Shalizi‘s paper Shalizi2011homophily.

Big data can be a potential solution. Having rich data may provide more opportunities to measure hidden confounders (e.g., see Keith2020text regarding the usage of text data). However, big data does not automatically address the fundamental challenges (Prosperi2020causal).