From 9117c9fcc793dbc31c209a285ada830fa28b42d0 Mon Sep 17 00:00:00 2001 From: zhengzangw Date: Fri, 14 Jun 2024 03:34:06 +0000 Subject: [PATCH] [docs] update --- docs/report_03.md | 4 ++++ opensora/datasets/aspect.py | 2 ++ 2 files changed, 6 insertions(+) diff --git a/docs/report_03.md b/docs/report_03.md index 458e077..e6b0f0e 100644 --- a/docs/report_03.md +++ b/docs/report_03.md @@ -75,6 +75,8 @@ Open-Sora 1.2 starts from the [PixArt-Σ 2K](https://github.com/PixArt-alpha/Pix After the above adaptation, we are ready to train the model on videos. The adaptation above maintains the original model's ability to generate high-quality images. +With rectified flow, we can reduce the number of sampling steps for video from 100 to 30, which greatly reduces the waiting time for inference. + ## More data and better multi-stage training Due to a limited computational budget, we carefully arrange the training data from low to high quality and split our training into three stages. Our training involves 12x8 GPUs, and the total training time is about 2 weeks. @@ -130,3 +132,5 @@ We sampled 1k videos from pixabay as validation dataset. We calculate the evalua In addition, we also keep track of [VBench](https://vchitect.github.io/VBench-project/) scores during training. VBench is an automatic video evaluation benchmark for short video generation. We calcuate the vbench score with 240p 2s videos. The two metrics verify that our model continues to improve during training. ![VBench](/assets/readme/report_vbench_score.png) + +All the evaluation code is released in `eval` folder. Check the [README](/eval/README.md) for more details. diff --git a/opensora/datasets/aspect.py b/opensora/datasets/aspect.py index d733c23..44c7839 100644 --- a/opensora/datasets/aspect.py +++ b/opensora/datasets/aspect.py @@ -476,10 +476,12 @@ NUM_FRAMES_MAP = { "2x": 102, "4x": 204, "8x": 408, + "16x": 816, "2s": 51, "4s": 102, "8s": 204, "16s": 408, + "32s": 816, }