add vbench performance

2026-04-11 13:14:44 +02:00 · 2024-06-17 10:25:55 +00:00 · 2024-06-17 10:25:55 +00:00 · 5cbfe53086
commit 5cbfe53086
parent f0c98dd186
8 changed files with 16 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -211,7 +211,7 @@ docker run -ti --gpus all -v {MOUNT_DIR}:/data opensora
 | Model     | Model Size | Data | #iterations | Batch Size | URL                                                           |
 | --------- | ---------- | ---- | ----------- | ---------- | ------------------------------------------------------------- |
 | Diffusion | 1.1B       | 30M  | 70k         | Dynamic    | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3) |
-| VAE       | 384M       |      |             |            | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |
+| VAE       | 384M       | 3M   | 1.18M       |     8      | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |

 See our **[report 1.2](docs/report_03.md)** for more infomation.

@ -237,7 +237,7 @@ See our **[report 1.1](docs/report_02.md)** for more infomation.
 <summary>View more</summary>

 | Resolution | Model Size | Data   | #iterations | Batch Size | GPU days (H800) | URL                                                                                           |
-| ---------- | ---------- | ------ | ----------- | ---------- | --------------- |
+| ---------- | ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |
 | 16×512×512 | 700M       | 20K HQ | 20k         | 2×64       | 35              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |
 | 16×256×256 | 700M       | 20K HQ | 24k         | 8×64       | 45              | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |
 | 16×256×256 | 700M       | 366K   | 80k         | 8×64       | 117             | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth)    |
@ -408,6 +408,7 @@ Before you run the following commands, follow our [Installation Documentation](d
 Once you prepare the data in a `csv` file, run the following commands to train the VAE.
 Note that you need to adjust the number of trained epochs (`epochs`) in the config file accordingly with respect to your own csv data size.

+
 ```bash
 # stage 1 training, 380k steps, 8 GPUs
 torchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH
--- a/configs/vae/inference/image.py
+++ b/configs/vae/inference/image.py
@ -21,7 +21,7 @@ num_workers = 4
 # Define model
 model = dict(
    type="OpenSoraVAE_V1_2",
-    from_pretrained="pretrained_models/vae-pipeline",
+    from_pretrained="hpcai-tech/OpenSora-VAE-v1.2",
    micro_frame_size=None,
    micro_batch_size=4,
    cal_loss=True,
--- a/configs/vae/inference/video.py
+++ b/configs/vae/inference/video.py
@ -21,7 +21,7 @@ num_workers = 4
 # Define model
 model = dict(
    type="OpenSoraVAE_V1_2",
-    from_pretrained="pretrained_models/vae-pipeline",
+    from_pretrained="hpcai-tech/OpenSora-VAE-v1.2",
    micro_frame_size=None,
    micro_batch_size=4,
    cal_loss=True,
--- a/configs/vae/train/stage1.py
+++ b/configs/vae/train/stage1.py
@ -46,7 +46,7 @@ use_image_identity_loss = True

 # Others
 seed = 42
-outputs = "outputs"
+outputs = "outputs/vae_stage1"
 wandb = False

 epochs = 100  # NOTE: adjust accordingly w.r.t dataset size
--- a/configs/vae/train/stage2.py
+++ b/configs/vae/train/stage2.py
@ -20,7 +20,7 @@ plugin = "zero2"
 model = dict(
    type="VideoAutoencoderPipeline",
    freeze_vae_2d=False,
-    from_pretrained=None,
+    from_pretrained="outputs/vae_stage1",
    cal_loss=True,
    vae_2d=dict(
        type="VideoAutoencoderKL",
@ -46,7 +46,7 @@ use_image_identity_loss = False

 # Others
 seed = 42
-outputs = "outputs"
+outputs = "outputs/vae_stage2"
 wandb = False

 epochs = 100  # NOTE: adjust accordingly w.r.t dataset size
--- a/configs/vae/train/stage3.py
+++ b/configs/vae/train/stage3.py
@ -20,7 +20,7 @@ plugin = "zero2"
 model = dict(
    type="VideoAutoencoderPipeline",
    freeze_vae_2d=False,
-    from_pretrained=None,
+    from_pretrained="outputs/vae_stage2",
    cal_loss=True,
    vae_2d=dict(
        type="VideoAutoencoderKL",
@ -45,7 +45,7 @@ use_image_identity_loss = False

 # Others
 seed = 42
-outputs = "outputs"
+outputs = "outputs/vae_stage3"
 wandb = False

 epochs = 100  # NOTE: adjust accordingly w.r.t dataset size
--- a/docs/report_03.md
+++ b/docs/report_03.md
@ -147,7 +147,12 @@ In addition, we also keep track of [VBench](https://vchitect.github.io/VBench-pr

 All the evaluation code is released in `eval` folder. Check the [README](/eval/README.md) for more details.

-[Final performance TBD]
+
+| Model          | Total Score | Quality Score | Semantic Score |
+| -------------- | ----------- | ------------- | -------------- |
+| Open-Sora V1.0 | 75.91%      | 78.81%        | 64.28%         |
+| Open-Sora V1.2 | 79.23%      | 80.71%        | 73.30%         |
+

 ## Sequence parallelism

--- a/docs/vae.md
+++ b/docs/vae.md
@ -59,5 +59,3 @@ We are grateful for the following work:
 * [Taming Transformers](https://github.com/CompVis/taming-transformers): Taming Transformers for High-Resolution Image Synthesis
 * [3D blur pooling](https://github.com/adobe/antialiased-cnns/pull/39/commits/3d6f02b6943c58b68c19c07bc26fad57492ff3bc)
 * [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)
-
-Special thanks go to the authors of [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) for their valuable advice and help.