Open-Sora/eval
2024-06-05 06:37:23 +00:00
..
human_eval update eval 2024-06-05 01:57:59 +00:00
loss Merge branch 'dev/v1.2' of github.com:hpcaitech/Open-Sora-dev into dev/v1.2 2024-06-05 01:58:31 +00:00
vae fix flolpips 2024-05-03 08:18:51 +00:00
vbench automatically calculate scaled scores 2024-06-05 06:37:23 +00:00
vbench_i2v automatically calculate scaled scores 2024-06-05 06:37:23 +00:00
README.md automatically calculate scaled scores 2024-06-05 06:37:23 +00:00
sample.sh save progress 2024-06-05 03:18:31 +00:00

Evalution

Human evaluation

To conduct human evaluation, we need to generate various samples. We provide many prompts in assets/texts, and defined some test setting covering different resolution, duration and aspect ratio in eval/sample.sh. To facilitate the usage of multiple GPUs, we split sampling tasks into several parts.

# image (1)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -1
# video (2a 2b 2c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -2a
# launch 8 jobs at once (you must read the script to understand the details)
bash eval/human_eval/launch.sh /path/to/ckpt num_frames model_name_for_log

Rectified Flow Loss

Evaluate the rectified flow loss with the following commands.

# image
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/img.csv --ckpt-path /path/to/ckpt

# video
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt

# select resolution
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt --resolution 720p

To launch multiple jobs at once, use the following script.

bash eval/loss/launch.sh /path/to/ckpt model_name

VBench

VBench is a benchmark for short text to video generation. We provide a script for easily generating samples required by VBench.

First, generate the relevant videos with the following commands:

# vbench tasks (4a 4b 4c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log  -4a
# launch 8 jobs at once (you must read the script to understand the details)
bash eval/vbench/launch.sh /path/to/ckpt num_frames model_name

After generation, install the VBench package following our installation's sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.

bash eval/vbench/vbench.sh /path/to/video_folder

Finally, we obtain the scaled scores for the model by:

python eval/vbench/tabulate_vbench_scores.py --score_dir path/to/evaluation_results/dir

VBench-i2v

VBench-i2v is a benchmark for short image to video generation (beta version). Similarly, install the VBench package following our installation's sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.

# Step 1: generate the relevant videos
# vbench i2v tasks (5a 5b 5c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -5a
# launch 8 jobs at once
bash eval/vbench_i2v/launch.sh /path/to/ckpt num_frames model_name

# Step 2: run vbench to evaluate the generated samples 
python eval/vbench_i2v/vbench_i2v.py
python eval/vbench_i2v/vbench_video_quality.py 

# Step 3: obtain the scaled scores
python eval/vbench_i2v/tabulate_vbench_i2v_scores.py --score_dir path/to/evaluation_results/dir

VAE

Install the dependencies package following our installation's s sections of "Evaluation Dependencies". Then, run the following evaluation command:

# metric can any one or list of: ssim, psnr, lpips, flolpips
python eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir path/to/original/videos --generated_video_dir path/to/generated/videos --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips