Open-Sora/eval
Shen-Chenhui 01fee497f3 format
2024-06-06 03:34:41 +00:00
..
human_eval format 2024-06-06 03:34:41 +00:00
loss format 2024-06-06 03:34:41 +00:00
vae fix flolpips 2024-05-03 08:18:51 +00:00
vbench format 2024-06-06 03:34:41 +00:00
vbench_i2v format 2024-06-06 03:34:41 +00:00
README.md format 2024-06-06 03:34:41 +00:00
sample.sh save progress 2024-06-05 03:18:31 +00:00

Evalution

Human evaluation

To conduct human evaluation, we need to generate various samples. We provide many prompts in assets/texts, and defined some test setting covering different resolution, duration and aspect ratio in eval/sample.sh. To facilitate the usage of multiple GPUs, we split sampling tasks into several parts.

# image (1)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -1
# video (2a 2b 2c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -2a
# launch 8 jobs at once (you must read the script to understand the details)
bash eval/human_eval/launch.sh /path/to/ckpt num_frames model_name_for_log

Rectified Flow Loss

Evaluate the rectified flow loss with the following commands.

# image
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/img.csv --ckpt-path /path/to/ckpt

# video
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt

# select resolution
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt --resolution 720p

To launch multiple jobs at once, use the following script.

bash eval/loss/launch.sh /path/to/ckpt model_name

To obtain an organized list of scores:

python eval/loss/tabulate_rl_loss.py --log_dir path/to/log/dir

VBench

VBench is a benchmark for short text to video generation. We provide a script for easily generating samples required by VBench.

First, generate the relevant videos with the following commands:

# vbench tasks (4a 4b 4c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log  -4a
# launch 8 jobs at once (you must read the script to understand the details)
bash eval/vbench/launch.sh /path/to/ckpt num_frames model_name

After generation, install the VBench package following our installation's sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.

bash eval/vbench/vbench.sh /path/to/video_folder /path/to/model/ckpt

Finally, we obtain the scaled scores for the model by:

python eval/vbench/tabulate_vbench_scores.py --score_dir path/to/score/dir

VBench-i2v

VBench-i2v is a benchmark for short image to video generation (beta version). Similarly, install the VBench package following our installation's sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.

# Step 1: generate the relevant videos
# vbench i2v tasks (5a 5b 5c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -5a
# launch 8 jobs at once
bash eval/vbench_i2v/launch.sh /path/to/ckpt num_frames model_name

# Step 2: run vbench to evaluate the generated samples
python eval/vbench_i2v/vbench_i2v.py

# Step 3: obtain the scaled scores
def load_i2v_dimension_info(json_dir, dimension, lang, resolution):
# Note that if you need to go to `your_conda_env_path/lib/python3.x/site-packages/vbench2_beta_i2v/utils.py` and change the harded-coded var `image_root` in the `load_i2v_dimension_info` function to your appropriate image folder.
python eval/vbench_i2v/tabulate_vbench_i2v_scores.py --score_dir path/to/evaluation_results/dir

VAE

Install the dependencies package following our installation's s sections of "Evaluation Dependencies". Then, run the following evaluation command:

# metric can any one or list of: ssim, psnr, lpips, flolpips
python eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir path/to/original/videos --generated_video_dir path/to/generated/videos --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips