# Datasets For Open-Sora 1.1, we conduct mixed training with both images and videos. The main datasets we use are listed below. Please refer to [README](/README.md#data-processing) for data processing. ## Panda-70M [Panda-70M](https://github.com/snap-research/Panda-70M) is a large-scale dataset with 70M video-caption pairs. We use the [training-10M subset](https://github.com/snap-research/Panda-70M/tree/main/dataset_dataloading) for training, which contains ~10M videos of better quality. ## Pexels [Pexels](https://www.pexels.com/) is a popular online platform that provides high-quality stock photos, videos, and music for free. Most videos from this website are of high quality. Thus, we use them for both pre-training and HQ fine-tuning. We really appreciate the great platform and the contributors! ## Inter4K [Inter4K](https://github.com/alexandrosstergiou/Inter4K) is a dataset containing 1K video clips with 4K resolution. The dataset is proposed for super-resolution tasks. We use the dataset for HQ fine-tuning. ## HD-VG-130M [HD-VG-130M](https://github.com/daooshee/HD-VG-130M?tab=readme-ov-file) comprises 130M text-video pairs. The caption is generated by BLIP-2. We find the scene and the text quality are relatively poor. For OpenSora 1.0, we only use ~350K samples from this dataset.