Open-Sora/docs/structure.md
Zheng Zangwei (Alex Zheng) f1c6b8b88e open-sora v1.3 code upload (#786)
Co-authored-by: gxyes <gxynoz@gmail.com>
2025-02-20 16:50:24 +08:00

125 lines
7 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Repo Structure
```plaintext
Open-Sora
├── README.md
├── assets
│ ├── images -> images used for image-conditioned generation
│ ├── demo -> images used for demo
│ ├── texts -> prompts used for text-conditioned generation
│ └── readme -> images used in README
├── configs -> Configs for training & inference
├── docker -> dockerfile for Open-Sora
├── docs
│ ├── acceleration.md -> Report on acceleration & speed benchmark
│ ├── commands.md -> Commands for training & inference
│ ├── datasets.md -> Datasets used in this project
| ├── data_processing.md -> Data pipeline documents
| ├── installation.md -> Data pipeline documents
│ ├── structure.md -> This file
│ ├── config.md -> Configs for training and inference
│ ├── report_01.md -> Report for Open-Sora 1.0
│ ├── report_02.md -> Report for Open-Sora 1.1
│ ├── report_03.md -> Report for Open-Sora 1.2
│ ├── report_04.md -> Report for Open-Sora 1.3
│ ├── vae.md -> our VAE report
│ └── zh_CN -> Chinese version of the above
├── eval -> Evaluation scripts
│ ├── README.md -> Evaluation documentation
| ├── human_eval -> for human eval
| ├── I2V -> for image to video human eval
| ├── loss -> eval loss
| ├── sample.sh -> script for quickly launching inference on predefined prompts
| ├── vae -> for vae eval
| ├── vbench -> for VBench evaluation
│ └── vbench_i2v -> for VBench i2v evaluation
├── gradio -> Gradio demo related code
├── scripts
│ ├── train.py -> diffusion training script
│ ├── train_opensoravae_v1_3.py -> vae v1.3 training script
│ ├── train_vae.py -> vae training script
│ ├── inference.py -> diffusion inference script
│ ├── inference_opensoravae_v1_3.py -> vae v1.3 training script
│ ├── inference_vae.py -> vae inference script
│ ├── inference_i2v.py -> image to video inference script
│ └── misc -> misc scripts, including batch size search
├── opensora
│ ├── __init__.py
│ ├── registry.py -> Registry helper
│   ├── acceleration -> Acceleration related code
│   ├── datasets -> Dataset related code
│   ├── models
│   │   ├── dit -> DiT
│   │   ├── layers -> Common layers
│   │   ├── vae -> VAE as image encoder
│   │   ├── vae_v1_3 -> VAE V1.3 as image encoder
│   │   ├── text_encoder -> Text encoder
│   │   │   ├── classes.py -> Class id encoder (inference only)
│   │   │   ├── clip.py -> CLIP encoder
│   │   │   └── t5.py -> T5 encoder
│   │   ├── dit
│   │   ├── latte
│   │   ├── pixart
│   │   └── stdit -> Our STDiT related code
│   ├── schedulers -> Diffusion schedulers
│   │   ├── iddpm -> IDDPM for training and inference
│   │ └── dpms -> DPM-Solver for fast inference
│ └── utils
├── tests -> Tests for the project
└── tools -> Tools for data processing and more
```
## Configs
Our config files follows [MMEgine](https://github.com/open-mmlab/mmengine). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object.
```plaintext
Open-Sora
└── configs -> Configs for training & inferences
├── opensora-v1-1 -> STDiT2 related configs
│ ├── inference
│ │ ├── sample.py -> Sample videos and images
│ │ └── sample-ref.py -> Sample videos with image/video condition
│ └── train
│ ├── stage1.py -> Stage 1 training config
│ ├── stage2.py -> Stage 2 training config
│ ├── stage3.py -> Stage 3 training config
│ ├── image.py -> Illustration of image training config
│ ├── video.py -> Illustration of video training config
│ └── benchmark.py -> For batch size searching
├── opensora -> STDiT related configs
│ ├── inference
│ │ ├── 16x256x256.py -> Sample videos 16 frames 256x256
│ │ ├── 16x512x512.py -> Sample videos 16 frames 512x512
│ │ └── 64x512x512.py -> Sample videos 64 frames 512x512
│ └── train
│ ├── 16x256x256.py -> Train on videos 16 frames 256x256
│ ├── 16x256x256.py -> Train on videos 16 frames 256x256
│ └── 64x512x512.py -> Train on videos 64 frames 512x512
├── dit -> DiT related configs
   │   ├── inference
   │   │   ├── 1x256x256-class.py -> Sample images with ckpts from DiT
   │   │   ├── 1x256x256.py -> Sample images with clip condition
   │   │   └── 16x256x256.py -> Sample videos
   │   └── train
   │     ├── 1x256x256.py -> Train on images with clip condition
   │      └── 16x256x256.py -> Train on videos
├── latte -> Latte related configs
└── pixart -> PixArt related configs
```
## Tools
```plaintext
Open-Sora
└── tools
├── datasets -> dataset management related code
├── scene_cut -> scene cut related code
├── caption -> caption related code
├── scoring -> scoring related code
│ ├── aesthetic -> aesthetic scoring related code
│ ├── matching -> matching scoring related code
│ ├── ocr -> ocr scoring related code
│ └── optical_flow -> optical flow scoring related code
└── frame_interpolation -> frame interpolation related code