| .. | ||
| human_eval | ||
| loss | ||
| vae | ||
| vbench | ||
| vbench_i2v | ||
| README.md | ||
| sample.sh | ||
Evalution
Human evaluation
To conduct human evaluation, we need to generate various samples. We provide many prompts in assets/texts, and defined some test setting covering different resolution, duration and aspect ratio in eval/sample.sh. To facilitate the usage of multiple GPUs, we split sampling tasks into several parts.
# image (1)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -1
# video (2a 2b 2c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -2a
# launch 8 jobs at once (you must read the script to understand the details)
bash eval/human_eval/launch.sh /path/to/ckpt num_frames model_name_for_log
Rectified Flow Loss
Evaluate the rectified flow loss with the following commands.
# image
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/img.csv --ckpt-path /path/to/ckpt
# video
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt
# select resolution
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt --resolution 720p
To launch multiple jobs at once, use the following script.
bash eval/loss/launch.sh /path/to/ckpt model_name
To obtain an organized list of scores:
python eval/loss/tabulate_rl_loss.py --log_dir path/to/log/dir
VBench
VBench is a benchmark for short text to video generation. We provide a script for easily generating samples required by VBench.
First, generate the relevant videos with the following commands:
# vbench tasks (4a 4b 4c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -4a
# launch 8 jobs at once (you must read the script to understand the details)
bash eval/vbench/launch.sh /path/to/ckpt num_frames model_name
After generation, install the VBench package following our installation's sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.
bash eval/vbench/vbench.sh /path/to/video_folder /path/to/model/ckpt
Finally, we obtain the scaled scores for the model by:
python eval/vbench/tabulate_vbench_scores.py --score_dir path/to/score/dir
VBench-i2v
VBench-i2v is a benchmark for short image to video generation (beta version). Similarly, install the VBench package following our installation's sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.
# Step 1: generate the relevant videos
# vbench i2v tasks (5a 5b 5c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -5a
# launch 8 jobs at once
bash eval/vbench_i2v/launch.sh /path/to/ckpt num_frames model_name
# Step 2: run vbench to evaluate the generated samples
python eval/vbench_i2v/vbench_i2v.py
# Step 3: obtain the scaled scores
def load_i2v_dimension_info(json_dir, dimension, lang, resolution):
# Note that if you need to go to `your_conda_env_path/lib/python3.x/site-packages/vbench2_beta_i2v/utils.py` and change the harded-coded var `image_root` in the `load_i2v_dimension_info` function to your appropriate image folder.
python eval/vbench_i2v/tabulate_vbench_i2v_scores.py --score_dir path/to/evaluation_results/dir
VAE
Install the dependencies package following our installation's s sections of "Evaluation Dependencies". Then, run the following evaluation command:
# metric can any one or list of: ssim, psnr, lpips, flolpips
python eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir path/to/original/videos --generated_video_dir path/to/generated/videos --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips