Language models for health records

LLMs, Patient representation learning

Solares2020deep and Xiao2018opportunities reviews the usage of Deep learning for health records.

There are two primary approaches: (1) using language models to study the ‘text’ data in medical records, such as clinical notes, and (2) modeling the diagnoses or procedure codes as “words”.

Med2Vec is an embedding of medical concepts such as diagnoses, medications, and procedures that takes the special structure into account.

Cai2018medical suggests a way to incorporate the temporal structure.

Transformer models

Regarding the modeling of text data, there are some attemps Blinov2020predicting.

Li2020BEHRT uses diagnosis data from UK to train a BERT model. Rasmy2020Med BERT is similar, but with a larger dataset.

Xu2019neural uses GPT model, but with a small dataset (less than 1M). This is applied to clinical notes, not the EHR diagnoses.