| .. | ||
| __init__.py | ||
| convert_dataset.py | ||
| csvutil.py | ||
| README.md | ||
Dataset Download and Management
Dataset Download
HD-VG-130M
This dataset comprises 130M text-video pairs. You can download the dataset and prepare it for training according to the dataset repository's instructions. There is a README.md file in the Google Drive link that provides instructions on how to download and cut the videos. For this version, we directly use the dataset provided by the authors.
VidProM
python -m tools.datasets.convert_dataset vidprom VIDPROM_FOLDER --info VidProM_semantic_unique.csv
Demo Dataset
You can use ImageNet and UCF101 for a quick demo. After downloading the datasets, you can use the following command to prepare the csv file for the dataset:
# ImageNet
python -m tools.datasets.convert_dataset imagenet IMAGENET_FOLDER --split train
# UCF101
python -m tools.datasets.convert_dataset ucf101 UCF101_FOLDER --split videos
Dataset Format
The dataset should be provided in a CSV file, which is used both for training and data preprocessing. The CSV file should only contain the following columns (can be optional). Aspect ratio is width divided by height.
path, text, num_frames, fps, width, height, aspect_ratio, aesthetic_score, clip_score
/absolute/path/to/image1.jpg, caption1, num_of_frames
/absolute/path/to/video2.mp4, caption2, num_of_frames
We use pandas to manage the CSV files. You can use the following code to read and write the CSV files:
df = pd.read_csv(input_path)
df = df.to_csv(output_path, index=False)
Manage datasets
We provide csvutils.py to manage the CSV files. You can use the following commands to process the CSV files:
# csvutil takes multiple CSV files as input and merge them into one CSV file
python -m tools.datasets.csvutil DATA1.csv DATA2.csv
# filter frames between 128 and 256, with captions
python -m tools.datasets.csvutil DATA.csv --fmin 128 --fmax 256 --remove-empty-caption
# compute the number of frames for each video
python -m tools.datasets.csvutil DATA.csv --video-info
# remove caption prefix
python -m tools.datasets.csvutil DATA.csv --remove-caption-prefix
# generate DATA_root.csv with absolute path
python -m tools.datasets.csvutil DATA.csv --abspath /absolute/path/to/dataset
# examine the first 10 rows of the CSV file
head -n 10 DATA1.csv
# count the number of data in the CSV file (approximately)
wc -l DATA1.csv
To accelerate processing speed, you can install pandarallel:
pip install pandarallel
To filter text language, you need to install lingua:
pip install lingua-language-detector
To get video information, you need to install opencv-python:
pip install opencv-python