Open-Sora/docs/structure.md

# Repo Structure

```plaintext
Open-Sora
├── README.md
├── assets
│   ├── images                     -> images used for image-conditioned generation
│   ├── demo                       -> images used for demo
│   ├── texts                      -> prompts used for text-conditioned generation
│   └── readme                     -> images used in README
├── configs                        -> Configs for training & inference
├── docker                         -> dockerfile for Open-Sora
├── docs
│   ├── acceleration.md            -> Report on acceleration & speed benchmark
│   ├── commands.md                -> Commands for training & inference
│   ├── datasets.md                -> Datasets used in this project
|   ├── data_processing.md         -> Data pipeline documents
|   ├── installation.md            -> Data pipeline documents
│   ├── structure.md               -> This file
│   ├── config.md                  -> Configs for training and inference
│   ├── report_01.md               -> Report for Open-Sora 1.0
│   ├── report_02.md               -> Report for Open-Sora 1.1
│   ├── report_03.md               -> Report for Open-Sora 1.2
│   ├── report_04.md               -> Report for Open-Sora 1.3
│   ├── vae.md                     -> our VAE report
│   └── zh_CN                      -> Chinese version of the above
├── eval                           -> Evaluation scripts
│   ├── README.md                  -> Evaluation documentation
|   ├── human_eval                 -> for human eval
|   ├── I2V                        -> for image to video human eval
|   ├── loss                       -> eval loss
|   ├── sample.sh                  -> script for quickly launching inference on predefined prompts
|   ├── vae                        -> for vae eval
|   ├── vbench                     -> for VBench evaluation
│   └── vbench_i2v                 -> for VBench i2v evaluation
├── gradio                         -> Gradio demo related code
├── scripts
│   ├── train.py                   -> diffusion training script
│   ├── train_opensoravae_v1_3.py  -> vae v1.3 training script
│   ├── train_vae.py               -> vae training script
│   ├── inference.py               -> diffusion inference script
│   ├── inference_opensoravae_v1_3.py   -> vae v1.3 training script
│   ├── inference_vae.py           -> vae inference script
│   ├── inference_i2v.py           -> image to video inference script
│   └── misc                       -> misc scripts, including batch size search
├── opensora
│   ├── __init__.py
│   ├── registry.py                -> Registry helper
│   ├── acceleration               -> Acceleration related code
│   ├── datasets                    -> Dataset related code
│   ├── models
│   │   ├── dit                    -> DiT
│   │   ├── layers                 -> Common layers
│   │   ├── vae                    -> VAE as image encoder
│   │   ├── vae_v1_3               -> VAE V1.3 as image encoder
│   │   ├── text_encoder           -> Text encoder
│   │   │   ├── classes.py         -> Class id encoder (inference only)
│   │   │   ├── clip.py            -> CLIP encoder
│   │   │   └── t5.py              -> T5 encoder
│   │   ├── dit
│   │   ├── latte
│   │   ├── pixart
│   │   └── stdit                  -> Our STDiT related code
│   ├── schedulers                 -> Diffusion schedulers
│   │   ├── iddpm                  -> IDDPM for training and inference
│   │   └── dpms                   -> DPM-Solver for fast inference
│   └── utils
├── tests                          -> Tests for the project
└── tools                          -> Tools for data processing and more
```

## Configs

Our config files follows [MMEgine](https://github.com/open-mmlab/mmengine). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object.

```plaintext
Open-Sora
└── configs                        -> Configs for training & inferences
    ├── opensora-v1-1              -> STDiT2 related configs
    │   ├── inference
    │   │   ├── sample.py          -> Sample videos and images
    │   │   └── sample-ref.py      -> Sample videos with image/video condition
    │   └── train
    │       ├── stage1.py          -> Stage 1 training config
    │       ├── stage2.py          -> Stage 2 training config
    │       ├── stage3.py          -> Stage 3 training config
    │       ├── image.py           -> Illustration of image training config
    │       ├── video.py           -> Illustration of video training config
    │       └── benchmark.py       -> For batch size searching
    ├── opensora                   -> STDiT related configs
    │   ├── inference
    │   │   ├── 16x256x256.py      -> Sample videos 16 frames 256x256
    │   │   ├── 16x512x512.py      -> Sample videos 16 frames 512x512
    │   │   └── 64x512x512.py      -> Sample videos 64 frames 512x512
    │   └── train
    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256
    │       ├── 16x256x256.py      -> Train on videos 16 frames 256x256
    │       └── 64x512x512.py      -> Train on videos 64 frames 512x512
    ├── dit                        -> DiT related configs
    │   ├── inference
    │   │   ├── 1x256x256-class.py -> Sample images with ckpts from DiT
    │   │   ├── 1x256x256.py       -> Sample images with clip condition
    │   │   └── 16x256x256.py      -> Sample videos
    │   └── train
    │       ├── 1x256x256.py       -> Train on images with clip condition
    │       └── 16x256x256.py      -> Train on videos
    ├── latte                      -> Latte related configs
    └── pixart                     -> PixArt related configs
```

## Tools

```plaintext
Open-Sora
└── tools
    ├── datasets                   -> dataset management related code
    ├── scene_cut                  -> scene cut related code
    ├── caption                    -> caption related code
    ├── scoring                    -> scoring related code
    │   ├── aesthetic              -> aesthetic scoring related code
    │   ├── matching               -> matching scoring related code
    │   ├── ocr                    -> ocr scoring related code
    │   └── optical_flow           -> optical flow scoring related code
    └── frame_interpolation        -> frame interpolation related code