Open-Sora/docs/datasets.md
Zheng Zangwei (Alex Zheng) 682a699aec Update image process (#5)
* [docs] update tool docs

* update aes
2024-03-29 23:34:10 +08:00

1.2 KiB

Datasets

HD-VG-130M

HD-VG-130M comprises 130M text-video pairs. The caption is generated by BLIP-2. We find the cut and the text quality are relatively poor. It contains 20 splits. For OpenSora 1.0, we use the first split (~350K). We plan to use the whole dataset and re-process it.

You can download the dataset and prepare it for training according to the dataset repository's instructions. There is a README.md file in the Google Drive link that provides instructions on how to download and cut the videos. For this version, we directly use the dataset provided by the authors.

Inter4k

Inter4k is a dataset containing 1k video clips with 4K resolution. The dataset is proposed for super-resolution tasks. We use the dataset for HQ training. The videos are processed as mentioned here.

Pexels.com

Pexels.com is a website that provides free stock photos and videos. We collect 19K video clips from this website for HQ training. The videos are processed as mentioned here.