Datasets for Large Language Models: A Comprehensive Survey