add vbench performance

This commit is contained in:
Shen-Chenhui 2024-06-17 10:25:55 +00:00
parent f0c98dd186
commit 5cbfe53086
8 changed files with 16 additions and 12 deletions

View file

@ -211,7 +211,7 @@ docker run -ti --gpus all -v {MOUNT_DIR}:/data opensora
| Model | Model Size | Data | #iterations | Batch Size | URL |
| --------- | ---------- | ---- | ----------- | ---------- | ------------------------------------------------------------- |
| Diffusion | 1.1B | 30M | 70k | Dynamic | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3) |
| VAE | 384M | | | | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |
| VAE | 384M | 3M | 1.18M | 8 | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |
See our **[report 1.2](docs/report_03.md)** for more infomation.
@ -237,7 +237,7 @@ See our **[report 1.1](docs/report_02.md)** for more infomation.
<summary>View more</summary>
| Resolution | Model Size | Data | #iterations | Batch Size | GPU days (H800) | URL |
| ---------- | ---------- | ------ | ----------- | ---------- | --------------- |
| ---------- | ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |
| 16×512×512 | 700M | 20K HQ | 20k | 2×64 | 35 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |
| 16×256×256 | 700M | 20K HQ | 24k | 8×64 | 45 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |
| 16×256×256 | 700M | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) |
@ -408,6 +408,7 @@ Before you run the following commands, follow our [Installation Documentation](d
Once you prepare the data in a `csv` file, run the following commands to train the VAE.
Note that you need to adjust the number of trained epochs (`epochs`) in the config file accordingly with respect to your own csv data size.
```bash
# stage 1 training, 380k steps, 8 GPUs
torchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH

View file

@ -21,7 +21,7 @@ num_workers = 4
# Define model
model = dict(
type="OpenSoraVAE_V1_2",
from_pretrained="pretrained_models/vae-pipeline",
from_pretrained="hpcai-tech/OpenSora-VAE-v1.2",
micro_frame_size=None,
micro_batch_size=4,
cal_loss=True,

View file

@ -21,7 +21,7 @@ num_workers = 4
# Define model
model = dict(
type="OpenSoraVAE_V1_2",
from_pretrained="pretrained_models/vae-pipeline",
from_pretrained="hpcai-tech/OpenSora-VAE-v1.2",
micro_frame_size=None,
micro_batch_size=4,
cal_loss=True,

View file

@ -46,7 +46,7 @@ use_image_identity_loss = True
# Others
seed = 42
outputs = "outputs"
outputs = "outputs/vae_stage1"
wandb = False
epochs = 100 # NOTE: adjust accordingly w.r.t dataset size

View file

@ -20,7 +20,7 @@ plugin = "zero2"
model = dict(
type="VideoAutoencoderPipeline",
freeze_vae_2d=False,
from_pretrained=None,
from_pretrained="outputs/vae_stage1",
cal_loss=True,
vae_2d=dict(
type="VideoAutoencoderKL",
@ -46,7 +46,7 @@ use_image_identity_loss = False
# Others
seed = 42
outputs = "outputs"
outputs = "outputs/vae_stage2"
wandb = False
epochs = 100 # NOTE: adjust accordingly w.r.t dataset size

View file

@ -20,7 +20,7 @@ plugin = "zero2"
model = dict(
type="VideoAutoencoderPipeline",
freeze_vae_2d=False,
from_pretrained=None,
from_pretrained="outputs/vae_stage2",
cal_loss=True,
vae_2d=dict(
type="VideoAutoencoderKL",
@ -45,7 +45,7 @@ use_image_identity_loss = False
# Others
seed = 42
outputs = "outputs"
outputs = "outputs/vae_stage3"
wandb = False
epochs = 100 # NOTE: adjust accordingly w.r.t dataset size

View file

@ -147,7 +147,12 @@ In addition, we also keep track of [VBench](https://vchitect.github.io/VBench-pr
All the evaluation code is released in `eval` folder. Check the [README](/eval/README.md) for more details.
[Final performance TBD]
| Model | Total Score | Quality Score | Semantic Score |
| -------------- | ----------- | ------------- | -------------- |
| Open-Sora V1.0 | 75.91% | 78.81% | 64.28% |
| Open-Sora V1.2 | 79.23% | 80.71% | 73.30% |
## Sequence parallelism

View file

@ -59,5 +59,3 @@ We are grateful for the following work:
* [Taming Transformers](https://github.com/CompVis/taming-transformers): Taming Transformers for High-Resolution Image Synthesis
* [3D blur pooling](https://github.com/adobe/antialiased-cnns/pull/39/commits/3d6f02b6943c58b68c19c07bc26fad57492ff3bc)
* [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)
Special thanks go to the authors of [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) for their valuable advice and help.