mirror of https://github.com/hpcaitech/Open-Sora.git synced 2026-02-22 21:43:19 +01:00

* add DCAE download cmd

* update download cmd

* update readme

2025-03-14 18:41:20 +08:00

2.1 KiB

Raw Permalink Blame History

10× inference speedup with high-compression autoencoder

The high computational cost of training video generation models arises from the large number of tokens and the dominance of attention computation. To further reduce training expenses, we explore training video generation models with high-compression autoencoders (Video DC-AEs). As shown in the comparason below, by switching to the Video DC-AE with a much higher downsample ratio (4 x 32 x 32), we can afford to further reduce the patch size to 1 and still achieve 5.2× speedup in training throughput and 10x speedup during inference:

Nevertheless, despite the advantanges in drastically lower computation costs, other challenges remain. For instance, larger channels low down convergance. Our generation model adapted with a 128-channel Video DC-AE for 25K iterations achieves a loss level of 0.5, as compared to 0.1 from the initialization model. While the fast video generation model underperforms the original, it still captures spatial-temporal relationships. We release this model to the research community for further exploration.

Checkout more details in our report.

Model Download

Download from 🤗 Huggingface:

pip install "huggingface_hub[cli]"
huggingface-cli download hpcai-tech/Open-Sora-v2-Video-DC-AE --local-dir ./ckpts

Inference

To inference on our fast video generation model:

torchrun --nproc_per_node 1 --standalone scripts/diffusion/inference.py configs/diffusion/inference/high_compression.py --prompt "The story of a robot's life in a cyberpunk setting."

Training

Follow this guide to parepare the DATASET for training. Then, you may train your own fast generation model with the following command:

torchrun --nproc_per_node 8 scripts/diffusion/train.py configs/diffusion/train/high_compression.py --dataset.data-path datasets/pexels_45k_necessary.csv

2.1 KiB Raw Permalink Blame History Unescape Escape

10× inference speedup with high-compression autoencoder

Model Download

Inference

Training

2.1 KiB

Raw Permalink Blame History