OLMo (Open Language Model)

An open LLM from AI2.

Data

an open dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.

The Dolma dataset and data toolkit Dolma were released together.