Open-Sora/eval/README.md

# Evalution

## Human evaluation

To conduct human evaluation, we need to generate various samples. We provide many prompts in `assets/texts`, and defined some test setting covering different resolution, duration and aspect ratio in `eval/sample.sh`. To facilitate the usage of multiple GPUs, we split sampling tasks into several parts.

```bash
# image (1)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -1
# video (2a 2b 2c ...)
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -2a
# launch 8 jobs at once (you must read the script to understand the details)
bash eval/human_eval/launch.sh /path/to/ckpt num_frames model_name_for_log
```

## Rectified Flow Loss

Evaluate the rectified flow loss with the following commands.

```bash
# image
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/img.csv --ckpt-path /path/to/ckpt

# video
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt

# select resolution
torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt --resolution 720p
```

To launch multiple jobs at once, use the following script.

```bash
bash eval/loss/launch.sh /path/to/ckpt model_name
```

To obtain an organized list of scores:
```bash
python eval/loss/tabulate_rl_loss.py --log_dir path/to/log/dir
```

## VBench

[VBench](https://github.com/Vchitect/VBench) is a benchmark for short text to video generation. We provide a script for easily generating samples required by VBench.

First, generate the relevant videos with the following commands:

```bash
# vbench task, if evaluation all set start_index to 0, end_index to 2000
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log  -4 start_index end_index
# Alternatively, launch 8 jobs at once (you must read the script to understand the details)
bash eval/vbench/launch.sh /path/to/ckpt num_frames model_name
```

After generation, install the VBench package following our [installation](../docs/installation.md)'s sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.

<!-- ```bash
bash eval/vbench/vbench.sh /path/to/video_folder /path/to/model/ckpt
``` -->

```bash
python eval/vbench/calc_vbench.py /path/to/video_folder /path/to/model/ckpt
```

Finally, we obtain the scaled scores for the model by:
```bash
python eval/vbench/tabulate_vbench_scores.py --score_dir path/to/score/dir
```

## VBench-i2v

[VBench-i2v](https://github.com/Vchitect/VBench/tree/master/vbench2_beta_i2v) is a benchmark for short image to video generation (beta version).
Similarly, install the VBench package following our [installation](../docs/installation.md)'s sections of "Evaluation Dependencies".

```bash
# Step 1: generate the relevant videos
# vbench i2v tasks, if evaluation all set start_index to 0, end_index to 2000
bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -5 start_index end_index
# Alternatively, launch 8 jobs at once
bash eval/vbench_i2v/launch.sh /path/to/ckpt num_frames model_name

# Step 2: run vbench to evaluate the generated samples
python eval/vbench_i2v/vbench_i2v.py /path/to/video_folder /path/to/model/ckpt
# Note that if you need to go to `VBench/vbench2_beta_i2v/utils.py` and change the harded-coded var `image_root` in the `load_i2v_dimension_info` function to your corresponding image folder.

# Step 3: obtain the scaled scores
python eval/vbench_i2v/tabulate_vbench_i2v_scores.py path/to/videos/folder path/to/your/model/ckpt
# this will store the results under `eval/vbench_i2v` in the path/to/your/model/ckpt

```

## VAE

Install the dependencies package following our [installation](../docs/installation.md)'s s sections of "Evaluation Dependencies". Then, run the following evaluation command:

```bash
# metric can any one or list of: ssim, psnr, lpips, flolpips
python eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir path/to/original/videos --generated_video_dir path/to/generated/videos --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips
```
update docs 2024-04-23 11:26:10 +02:00			`# Evalution`

			`## Human evaluation`

			To conduct human evaluation, we need to generate various samples. We provide many prompts in `assets/texts`, and defined some test setting covering different resolution, duration and aspect ratio in `eval/sample.sh`. To facilitate the usage of multiple GPUs, we split sampling tasks into several parts.

			```bash
update config 2024-04-24 04:33:27 +02:00			`# image (1)`
Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -1`
update config 2024-04-24 04:33:27 +02:00			`# video (2a 2b 2c ...)`
Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -2a`
update docs 2024-04-23 11:26:10 +02:00			`# launch 8 jobs at once (you must read the script to understand the details)`
Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`bash eval/human_eval/launch.sh /path/to/ckpt num_frames model_name_for_log`
update docs 2024-04-23 11:26:10 +02:00			```

[feat] update bs search and loss eval 2024-05-15 15:05:53 +02:00			`## Rectified Flow Loss`

[feat] add lancher for eval loss and search bs 2024-05-15 19:06:54 +02:00			`Evaluate the rectified flow loss with the following commands.`

[feat] update bs search and loss eval 2024-05-15 15:05:53 +02:00			```bash
[feat] add lancher for eval loss and search bs 2024-05-15 19:06:54 +02:00			`# image`
			`torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/img.csv --ckpt-path /path/to/ckpt`

			`# video`
			`torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt`
[feat] update bs search and loss eval 2024-05-15 15:05:53 +02:00
[feat] add lancher for eval loss and search bs 2024-05-15 19:06:54 +02:00			`# select resolution`
			`torchrun --standalone --nproc_per_node 1 eval/loss/eval_loss.py configs/opensora-v1-2/misc/eval_loss.py --data-path /path/to/vid.csv --ckpt-path /path/to/ckpt --resolution 720p`
			```
[feat] update bs search and loss eval 2024-05-15 15:05:53 +02:00
[feat] add lancher for eval loss and search bs 2024-05-15 19:06:54 +02:00			`To launch multiple jobs at once, use the following script.`

			```bash
Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`bash eval/loss/launch.sh /path/to/ckpt model_name`
[feat] update bs search and loss eval 2024-05-15 15:05:53 +02:00			```

format 2024-06-05 09:51:15 +02:00			`To obtain an organized list of scores:`
			```bash
			`python eval/loss/tabulate_rl_loss.py --log_dir path/to/log/dir`
			```

update docs 2024-04-23 11:26:10 +02:00			`## VBench`

			`[VBench](https://github.com/Vchitect/VBench) is a benchmark for short text to video generation. We provide a script for easily generating samples required by VBench.`

Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`First, generate the relevant videos with the following commands:`

update docs 2024-04-23 11:26:10 +02:00			```bash
add more instructions 2024-06-07 04:12:11 +02:00			`# vbench task, if evaluation all set start_index to 0, end_index to 2000`
			`bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -4 start_index end_index`
			`# Alternatively, launch 8 jobs at once (you must read the script to understand the details)`
automatically calculate scaled scores 2024-06-05 08:37:23 +02:00			`bash eval/vbench/launch.sh /path/to/ckpt num_frames model_name`
update docs 2024-04-23 11:26:10 +02:00			```

Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`After generation, install the VBench package following our [installation](../docs/installation.md)'s sections of "Evaluation Dependencies". Then, run the following commands to evaluate the generated samples.`
update config 2024-04-24 04:33:27 +02:00
format 2024-06-06 09:06:26 +02:00			<!-- ```bash
format 2024-06-05 09:51:15 +02:00			`bash eval/vbench/vbench.sh /path/to/video_folder /path/to/model/ckpt`
format 2024-06-06 09:06:26 +02:00			``` -->

			```bash
			`python eval/vbench/calc_vbench.py /path/to/video_folder /path/to/model/ckpt`
update config 2024-04-24 04:33:27 +02:00			```
update docs 2024-04-23 11:26:10 +02:00
automatically calculate scaled scores 2024-06-05 08:37:23 +02:00			`Finally, we obtain the scaled scores for the model by:`
			```bash
format 2024-06-05 09:51:15 +02:00			`python eval/vbench/tabulate_vbench_scores.py --score_dir path/to/score/dir`
automatically calculate scaled scores 2024-06-05 08:37:23 +02:00			```

update docs 2024-04-23 11:26:10 +02:00			`## VBench-i2v`

			`[VBench-i2v](https://github.com/Vchitect/VBench/tree/master/vbench2_beta_i2v) is a benchmark for short image to video generation (beta version).`
add more instructions 2024-06-07 04:12:11 +02:00			`Similarly, install the VBench package following our [installation](../docs/installation.md)'s sections of "Evaluation Dependencies".`
automatically calculate scaled scores 2024-06-05 08:37:23 +02:00
			```bash
			`# Step 1: generate the relevant videos`
add more instructions 2024-06-07 04:12:11 +02:00			`# vbench i2v tasks, if evaluation all set start_index to 0, end_index to 2000`
			`bash eval/sample.sh /path/to/ckpt num_frames model_name_for_log -5 start_index end_index`
			`# Alternatively, launch 8 jobs at once`
automatically calculate scaled scores 2024-06-05 08:37:23 +02:00			`bash eval/vbench_i2v/launch.sh /path/to/ckpt num_frames model_name`

format 2024-06-05 09:51:15 +02:00			`# Step 2: run vbench to evaluate the generated samples`
format 2024-06-06 09:06:26 +02:00			`python eval/vbench_i2v/vbench_i2v.py /path/to/video_folder /path/to/model/ckpt`
			# Note that if you need to go to `VBench/vbench2_beta_i2v/utils.py` and change the harded-coded var `image_root` in the `load_i2v_dimension_info` function to your corresponding image folder.
update docs 2024-04-23 11:26:10 +02:00
automatically calculate scaled scores 2024-06-05 08:37:23 +02:00			`# Step 3: obtain the scaled scores`
format 2024-06-06 09:06:26 +02:00			`python eval/vbench_i2v/tabulate_vbench_i2v_scores.py path/to/videos/folder path/to/your/model/ckpt`
			# this will store the results under `eval/vbench_i2v` in the path/to/your/model/ckpt
automatically calculate scaled scores 2024-06-05 08:37:23 +02:00
			```
eval vae working 2024-04-30 11:41:46 +02:00
			`## VAE`

Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`Install the dependencies package following our [installation](../docs/installation.md)'s s sections of "Evaluation Dependencies". Then, run the following evaluation command:`
eval vae working 2024-04-30 11:41:46 +02:00
			```bash
update eval readme 2024-05-03 10:20:26 +02:00			`# metric can any one or list of: ssim, psnr, lpips, flolpips`
Feature/installation (#109) * format * format * caption environment * format * update setuptools * format * format * reformat * format * fix launch error * relax version requirements * change logdir name --------- Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu> 2024-05-23 11:20:33 +02:00			`python eval/vae/eval_common_metric.py --batch_size 2 --real_video_dir path/to/original/videos --generated_video_dir path/to/generated/videos --device cuda --sample_fps 24 --crop_size 256 --resolution 256 --num_frames 17 --sample_rate 1 --metric ssim psnr lpips flolpips`
eval vae working 2024-04-30 11:41:46 +02:00			```