OLMo (Open Language Model)
Data
an open dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.
The Dolma dataset and data toolkit Dolma were released together.
an open dataset of 3 trillion tokens from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.
The Dolma dataset and data toolkit Dolma were released together.