Predicting poverty and wealth from mobile phone metadata

A nice example of Machine learning for social scientists. See also Machine learning vs.

It also uses national surveys to construct the composite wealth index. This part is all about the social science approach and what makes it paper quite strong.

It uses the mobile phone record, which is quite complex.

Feature engineering

Our goal in engineering features is to transform an individual’s mobile phone transaction logs into a set of quantitative metrics that in turn can be used to infer that same individual’s economic state. In the related literature, the most common approach has been to carefully construct a small number of intuitive indicators from the phone metrics, and compare regional aggregates of those phone metrics to regional socioeconomic indicators. In such work, for instance, there is evidence that the geographic diversity and reciprocal nature of social relationships are both correlated with economic outcomes (8, 35–38).

Our goal is different. We seek to develop measures of poverty and wealth that maximize predictive accuracy, possibly at the expense of the interpretability of the model. Thus, instead of devising a parsimonious set of metrics based on intuition, we take a brute force to feature engineering that is designed to capture as much variation as possible from the raw call detail records. Specifically, we develop a method based on a deterministic finite automaton (DFA) (39) to generate a large number of potentially correlated metrics, and then rely on regularization and .

-> more than 5000 features were generated by DFA. -> Elastic net regularization. Lasso + Ridge regression.