mirror of
https://github.com/hpcaitech/Open-Sora.git
synced 2026-04-10 21:01:26 +02:00
add vbench performance
This commit is contained in:
parent
f0c98dd186
commit
5cbfe53086
|
|
@ -211,7 +211,7 @@ docker run -ti --gpus all -v {MOUNT_DIR}:/data opensora
|
|||
| Model | Model Size | Data | #iterations | Batch Size | URL |
|
||||
| --------- | ---------- | ---- | ----------- | ---------- | ------------------------------------------------------------- |
|
||||
| Diffusion | 1.1B | 30M | 70k | Dynamic | [:link:](https://huggingface.co/hpcai-tech/OpenSora-STDiT-v3) |
|
||||
| VAE | 384M | | | | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |
|
||||
| VAE | 384M | 3M | 1.18M | 8 | [:link:](https://huggingface.co/hpcai-tech/OpenSora-VAE-v1.2) |
|
||||
|
||||
See our **[report 1.2](docs/report_03.md)** for more infomation.
|
||||
|
||||
|
|
@ -237,7 +237,7 @@ See our **[report 1.1](docs/report_02.md)** for more infomation.
|
|||
<summary>View more</summary>
|
||||
|
||||
| Resolution | Model Size | Data | #iterations | Batch Size | GPU days (H800) | URL |
|
||||
| ---------- | ---------- | ------ | ----------- | ---------- | --------------- |
|
||||
| ---------- | ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |
|
||||
| 16×512×512 | 700M | 20K HQ | 20k | 2×64 | 35 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |
|
||||
| 16×256×256 | 700M | 20K HQ | 24k | 8×64 | 45 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |
|
||||
| 16×256×256 | 700M | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) |
|
||||
|
|
@ -408,6 +408,7 @@ Before you run the following commands, follow our [Installation Documentation](d
|
|||
Once you prepare the data in a `csv` file, run the following commands to train the VAE.
|
||||
Note that you need to adjust the number of trained epochs (`epochs`) in the config file accordingly with respect to your own csv data size.
|
||||
|
||||
|
||||
```bash
|
||||
# stage 1 training, 380k steps, 8 GPUs
|
||||
torchrun --nnodes=1 --nproc_per_node=8 scripts/train_vae.py configs/vae/train/stage1.py --data-path YOUR_CSV_PATH
|
||||
|
|
|
|||
|
|
@ -21,7 +21,7 @@ num_workers = 4
|
|||
# Define model
|
||||
model = dict(
|
||||
type="OpenSoraVAE_V1_2",
|
||||
from_pretrained="pretrained_models/vae-pipeline",
|
||||
from_pretrained="hpcai-tech/OpenSora-VAE-v1.2",
|
||||
micro_frame_size=None,
|
||||
micro_batch_size=4,
|
||||
cal_loss=True,
|
||||
|
|
|
|||
|
|
@ -21,7 +21,7 @@ num_workers = 4
|
|||
# Define model
|
||||
model = dict(
|
||||
type="OpenSoraVAE_V1_2",
|
||||
from_pretrained="pretrained_models/vae-pipeline",
|
||||
from_pretrained="hpcai-tech/OpenSora-VAE-v1.2",
|
||||
micro_frame_size=None,
|
||||
micro_batch_size=4,
|
||||
cal_loss=True,
|
||||
|
|
|
|||
|
|
@ -46,7 +46,7 @@ use_image_identity_loss = True
|
|||
|
||||
# Others
|
||||
seed = 42
|
||||
outputs = "outputs"
|
||||
outputs = "outputs/vae_stage1"
|
||||
wandb = False
|
||||
|
||||
epochs = 100 # NOTE: adjust accordingly w.r.t dataset size
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ plugin = "zero2"
|
|||
model = dict(
|
||||
type="VideoAutoencoderPipeline",
|
||||
freeze_vae_2d=False,
|
||||
from_pretrained=None,
|
||||
from_pretrained="outputs/vae_stage1",
|
||||
cal_loss=True,
|
||||
vae_2d=dict(
|
||||
type="VideoAutoencoderKL",
|
||||
|
|
@ -46,7 +46,7 @@ use_image_identity_loss = False
|
|||
|
||||
# Others
|
||||
seed = 42
|
||||
outputs = "outputs"
|
||||
outputs = "outputs/vae_stage2"
|
||||
wandb = False
|
||||
|
||||
epochs = 100 # NOTE: adjust accordingly w.r.t dataset size
|
||||
|
|
|
|||
|
|
@ -20,7 +20,7 @@ plugin = "zero2"
|
|||
model = dict(
|
||||
type="VideoAutoencoderPipeline",
|
||||
freeze_vae_2d=False,
|
||||
from_pretrained=None,
|
||||
from_pretrained="outputs/vae_stage2",
|
||||
cal_loss=True,
|
||||
vae_2d=dict(
|
||||
type="VideoAutoencoderKL",
|
||||
|
|
@ -45,7 +45,7 @@ use_image_identity_loss = False
|
|||
|
||||
# Others
|
||||
seed = 42
|
||||
outputs = "outputs"
|
||||
outputs = "outputs/vae_stage3"
|
||||
wandb = False
|
||||
|
||||
epochs = 100 # NOTE: adjust accordingly w.r.t dataset size
|
||||
|
|
|
|||
|
|
@ -147,7 +147,12 @@ In addition, we also keep track of [VBench](https://vchitect.github.io/VBench-pr
|
|||
|
||||
All the evaluation code is released in `eval` folder. Check the [README](/eval/README.md) for more details.
|
||||
|
||||
[Final performance TBD]
|
||||
|
||||
| Model | Total Score | Quality Score | Semantic Score |
|
||||
| -------------- | ----------- | ------------- | -------------- |
|
||||
| Open-Sora V1.0 | 75.91% | 78.81% | 64.28% |
|
||||
| Open-Sora V1.2 | 79.23% | 80.71% | 73.30% |
|
||||
|
||||
|
||||
## Sequence parallelism
|
||||
|
||||
|
|
|
|||
|
|
@ -59,5 +59,3 @@ We are grateful for the following work:
|
|||
* [Taming Transformers](https://github.com/CompVis/taming-transformers): Taming Transformers for High-Resolution Image Synthesis
|
||||
* [3D blur pooling](https://github.com/adobe/antialiased-cnns/pull/39/commits/3d6f02b6943c58b68c19c07bc26fad57492ff3bc)
|
||||
* [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan)
|
||||
|
||||
Special thanks go to the authors of [Open-Sora-Plan](https://github.com/PKU-YuanGroup/Open-Sora-Plan) for their valuable advice and help.
|
||||
|
|
|
|||
Loading…
Reference in a new issue