Open-Sora/docs/structure.md

# Repo Structure

```plaintext
Open-Sora
├── README.md
├── assets
│   ├── images                     -> images used for image-conditioned generation
│   ├── demo                       -> images used for demo
│   ├── texts                      -> prompts used for text-conditioned generation
│   └── readme                     -> images used in README
├── configs                        -> Configs for training & inference
├── docker                         -> dockerfile for Open-Sora
├── docs
│   ├── acceleration.md            -> Report on acceleration & speed benchmark
│   ├── commands.md                -> Commands for training & inference
│   ├── datasets.md                -> Datasets used in this project
|   ├── data_processing.md         -> Data pipeline documents
|   ├── installation.md            -> Data pipeline documents
│   ├── structure.md               -> This file
│   ├── config.md                  -> Configs for training and inference
│   ├── report_01.md               -> Report for Open-Sora 1.0
│   ├── report_02.md               -> Report for Open-Sora 1.1
│   ├── report_03.md               -> Report for Open-Sora 1.2
│   ├── report_04.md               -> Report for Open-Sora 1.3
│   ├── vae.md                     -> our VAE report
│   └── zh_CN                      -> Chinese version of the above
├── eval                           -> Evaluation scripts
│   ├── README.md                  -> Evaluation documentation
|   ├── human_eval                 -> for human eval
|   ├── I2V                        -> for image to video human eval
|   ├── loss                       -> eval loss
|   ├── sample.sh                  -> script for quickly launching inference on predefined prompts
|   ├── vae                        -> for vae eval
|   ├── vbench                     -> for VBench evaluation
│   └── vbench_i2v                 -> for VBench i2v evaluation
├── gradio                         -> Gradio demo related code
├── scripts
│   ├── train.py                   -> diffusion training script
│   ├── train_opensoravae_v1_3.py  -> vae v1.3 training script
│   ├── train_vae.py               -> vae training script
│   ├── inference.py               -> diffusion inference script
│   ├── inference_opensoravae_v1_3.py   -> vae v1.3 training script
│   ├── inference_vae.py           -> vae inference script
│   ├── inference_i2v.py           -> image to video inference script
│   └── misc                       -> misc scripts, including batch size search
├── opensora
│   ├── __init__.py
│   ├── registry.py                -> Registry helper
│   ├── acceleration               -> Acceleration related code
│   ├── datasets                    -> Dataset related code
│   ├── models
│   │   ├── dit                    -> DiT
│   │   ├── layers                 -> Common layers
│   │   ├── vae                    -> VAE as image encoder
│   │   ├── vae_v1_3               -> VAE V1.3 as image encoder
│   │   ├── text_encoder           -> Text encoder
│   │   │   ├── classes.py         -> Class id encoder (inference only)
│   │   │   ├── clip.py            -> CLIP encoder
│   │   │   └── t5.py              -> T5 encoder
│   │   ├── dit
│   │   ├── latte
│   │   ├── pixart
│   │   └── stdit                  -> Our STDiT related code
│   ├── schedulers                 -> Diffusion schedulers
│   │   ├── iddpm                  -> IDDPM for training and inference
│   │   └── dpms                   -> DPM-Solver for fast inference
│   └── utils
├── tests                          -> Tests for the project
└── tools                          -> Tools for data processing and more
```

## Configs

Our config files follows [MMEgine](https://github.com/open-mmlab/mmengine). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object.

```plaintext
Open-Sora
└── configs                        -> Configs for training & inferences
    ├── opensora-v1-1              -> STDiT2 related configs
    │   ├── inference
    │   │   ├── sample.py          -> Sample videos and images
    │   │   └── sample-ref.py      -> Sample videos with image/video condition
    │   └── train
    │       ├── stage1.py          -> Stage 1 training config
    │       ├── stage2.py          -> Stage 2 training config
    │       ├── stage3.py          -> Stage 3 training config
    │       ├── image.py           -> Illustration of image training config
    │       ├── video.py           -> Illustration of video training config
    │       └── benchmark.py       -> For batch size searching
    ├── opensora                   -> STDiT related configs
    │   ├── inference
    │   │   ├── 16x256x256.py      -> Sample videos 16 frames 256x256
    │   │   ├── 16x512x512.py      -> Sample videos 16 frames 512x512
    │   │   └── 64x512x512.py      -> Sample videos 64 frames 512x512
    │   └── train
    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256
    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256
    │       └── 64x512x512.py      -> Train on videos 64 frames 512x512
    ├── dit                        -> DiT related configs
    │   ├── inference
    │   │   ├── 1x256x256-class.py -> Sample images with ckpts from DiT
    │   │   ├── 1x256x256.py       -> Sample images with clip condition
    │   │   └── 16x256x256.py      -> Sample videos
    │   └── train
    │       ├── 1x256x256.py       -> Train on images with clip condition
    │       └── 16x256x256.py      -> Train on videos
    ├── latte                      -> Latte related configs
    └── pixart                     -> PixArt related configs
```

## Tools

```plaintext
Open-Sora
└── tools
    ├── datasets                   -> dataset management related code
    ├── scene_cut                  -> scene cut related code
    ├── caption                    -> caption related code
    ├── scoring                    -> scoring related code
    │   ├── aesthetic              -> aesthetic scoring related code
    │   ├── matching               -> matching scoring related code
    │   ├── ocr                    -> ocr scoring related code
    │   └── optical_flow           -> optical flow scoring related code
    └── frame_interpolation        -> frame interpolation related code
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								# Repo Structure
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
 								```plaintext
 								Open-Sora
 								├── README.md
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								├── assets
 								│   ├── images                     -> images used for image-conditioned generation
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   ├── demo                       -> images used for demo
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								│   ├── texts                      -> prompts used for text-conditioned generation
 								│   └── readme                     -> images used in README
 								├── configs                        -> Configs for training & inference
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								├── docker                         -> dockerfile for Open-Sora
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								├── docs
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								│   ├── acceleration.md            -> Report on acceleration & speed benchmark
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   ├── commands.md                -> Commands for training & inference
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								│   ├── datasets.md                -> Datasets used in this project
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								|   ├── data_processing.md         -> Data pipeline documents
 								|   ├── installation.md            -> Data pipeline documents
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								│   ├── structure.md               -> This file
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   ├── config.md                  -> Configs for training and inference
-												update structure.md

											
										
										
											2024-06-17 16:49:51 +02:00
+								│   ├── report_01.md               -> Report for Open-Sora 1.0
 								│   ├── report_02.md               -> Report for Open-Sora 1.1
 								│   ├── report_03.md               -> Report for Open-Sora 1.2
-												open-sora v1.3 code upload (#786)

Co-authored-by: gxyes <gxynoz@gmail.com>
											
										
										
											2025-02-20 09:50:24 +01:00
+								│   ├── report_04.md               -> Report for Open-Sora 1.3
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   ├── vae.md                     -> our VAE report
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								│   └── zh_CN                      -> Chinese version of the above
 								├── eval                           -> Evaluation scripts
 								│   ├── README.md                  -> Evaluation documentation
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								|   ├── human_eval                 -> for human eval
-												open-sora v1.3 code upload (#786)

Co-authored-by: gxyes <gxynoz@gmail.com>
											
										
										
											2025-02-20 09:50:24 +01:00
+								|   ├── I2V                        -> for image to video human eval
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								|   ├── loss                       -> eval loss
 								|   ├── sample.sh                  -> script for quickly launching inference on predefined prompts
 								|   ├── vae                        -> for vae eval
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								|   ├── vbench                     -> for VBench evaluation
 								│   └── vbench_i2v                 -> for VBench i2v evaluation
 								├── gradio                         -> Gradio demo related code
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								├── scripts
 								│   ├── train.py                   -> diffusion training script
-												open-sora v1.3 code upload (#786)

Co-authored-by: gxyes <gxynoz@gmail.com>
											
										
										
											2025-02-20 09:50:24 +01:00
+								│   ├── train_opensoravae_v1_3.py  -> vae v1.3 training script
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   ├── train_vae.py               -> vae training script
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								│   ├── inference.py               -> diffusion inference script
-												open-sora v1.3 code upload (#786)

Co-authored-by: gxyes <gxynoz@gmail.com>
											
										
										
											2025-02-20 09:50:24 +01:00
+								│   ├── inference_opensoravae_v1_3.py   -> vae v1.3 training script
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   ├── inference_vae.py           -> vae inference script
-												open-sora v1.3 code upload (#786)

Co-authored-by: gxyes <gxynoz@gmail.com>
											
										
										
											2025-02-20 09:50:24 +01:00
+								│   ├── inference_i2v.py           -> image to video inference script
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								│   └── misc                       -> misc scripts, including batch size search
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								├── opensora
 								│   ├── __init__.py
 								│   ├── registry.py                -> Registry helper
 								│   ├── acceleration               -> Acceleration related code
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   ├── datasets                    -> Dataset related code
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								│   ├── models
-												update structure.md

											
										
										
											2024-06-17 17:17:01 +02:00
+								│   │   ├── dit                    -> DiT
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								│   │   ├── layers                 -> Common layers
 								│   │   ├── vae                    -> VAE as image encoder
-												open-sora v1.3 code upload (#786)

Co-authored-by: gxyes <gxynoz@gmail.com>
											
										
										
											2025-02-20 09:50:24 +01:00
+								│   │   ├── vae_v1_3               -> VAE V1.3 as image encoder
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								│   │   ├── text_encoder           -> Text encoder
 								│   │   │   ├── classes.py         -> Class id encoder (inference only)
 								│   │   │   ├── clip.py            -> CLIP encoder
 								│   │   │   └── t5.py              -> T5 encoder
 								│   │   ├── dit
 								│   │   ├── latte
 								│   │   ├── pixart
 								│   │   └── stdit                  -> Our STDiT related code
-												Update structure.md (#131)

shedulers -> schedulers
											
										
										
											2024-03-19 06:11:22 +01:00
+								│   ├── schedulers                 -> Diffusion schedulers
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								│   │   ├── iddpm                  -> IDDPM for training and inference
 								│   │   └── dpms                   -> DPM-Solver for fast inference
 								│   └── utils
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								├── tests                          -> Tests for the project
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								└── tools                          -> Tools for data processing and more
 								```
 								## Configs
 								Our config files follows [MMEgine](https://github.com/open-mmlab/mmengine). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object.
 								```plaintext
 								Open-Sora
-												open-sora v1.3 code upload (#786)

Co-authored-by: gxyes <gxynoz@gmail.com>
											
										
										
											2025-02-20 09:50:24 +01:00
+								└── configs                        -> Configs for training & inferences
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								    ├── opensora-v1-1              -> STDiT2 related configs
 								    │   ├── inference
 								    │   │   ├── sample.py          -> Sample videos and images
 								    │   │   └── sample-ref.py      -> Sample videos with image/video condition
 								    │   └── train
 								    │       ├── stage1.py          -> Stage 1 training config
 								    │       ├── stage2.py          -> Stage 2 training config
 								    │       ├── stage3.py          -> Stage 3 training config
 								    │       ├── image.py           -> Illustration of image training config
 								    │       ├── video.py           -> Illustration of video training config
 								    │       └── benchmark.py       -> For batch size searching
-												update docs

											
										
										
											2024-03-16 08:32:08 +01:00
+								    ├── opensora                   -> STDiT related configs
 								    │   ├── inference
 								    │   │   ├── 16x256x256.py      -> Sample videos 16 frames 256x256
 								    │   │   ├── 16x512x512.py      -> Sample videos 16 frames 512x512
 								    │   │   └── 64x512x512.py      -> Sample videos 64 frames 512x512
 								    │   └── train
 								    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256
 								    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256
 								    │       └── 64x512x512.py      -> Train on videos 64 frames 512x512
 								    ├── dit                        -> DiT related configs
 								    │   ├── inference
 								    │   │   ├── 1x256x256-class.py -> Sample images with ckpts from DiT
 								    │   │   ├── 1x256x256.py       -> Sample images with clip condition
 								    │   │   └── 16x256x256.py      -> Sample videos
 								    │   └── train
 								    │       ├── 1x256x256.py       -> Train on images with clip condition
 								    │       └── 16x256x256.py      -> Train on videos
 								    ├── latte                      -> Latte related configs
 								    └── pixart                     -> PixArt related configs
 								```
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								## Tools
-												Docs/readme (#73)

* update docs

* update docs
											
										
										
											2024-03-16 10:09:00 +01:00
 								```plaintext
-												update structure.md

											
										
										
											2024-04-23 09:41:33 +02:00
+								Open-Sora
 								└── tools
 								    ├── datasets                   -> dataset management related code
 								    ├── scene_cut                  -> scene cut related code
 								    ├── caption                    -> caption related code
 								    ├── scoring                    -> scoring related code
 								    │   ├── aesthetic              -> aesthetic scoring related code
 								    │   ├── matching               -> matching scoring related code
 								    │   ├── ocr                    -> ocr scoring related code
 								    │   └── optical_flow           -> optical flow scoring related code
 								    └── frame_interpolation        -> frame interpolation related code