2024-04-23 09:41:33 +02:00
# Repo Structure
2024-03-16 08:32:08 +01:00
```plaintext
Open-Sora
├── README.md
2024-04-23 09:41:33 +02:00
├── assets
│ ├── images -> images used for image-conditioned generation
2024-06-17 17:17:01 +02:00
│ ├── demo -> images used for demo
2024-04-23 09:41:33 +02:00
│ ├── texts -> prompts used for text-conditioned generation
│ └── readme -> images used in README
├── configs -> Configs for training & inference
2024-06-17 17:17:01 +02:00
├── docker -> dockerfile for Open-Sora
2024-03-16 08:32:08 +01:00
├── docs
2024-04-23 09:41:33 +02:00
│ ├── acceleration.md -> Report on acceleration & speed benchmark
2024-06-17 17:17:01 +02:00
│ ├── commands.md -> Commands for training & inference
2024-03-16 08:32:08 +01:00
│ ├── datasets.md -> Datasets used in this project
2024-06-17 17:17:01 +02:00
| ├── data_processing.md -> Data pipeline documents
| ├── installation.md -> Data pipeline documents
2024-03-16 08:32:08 +01:00
│ ├── structure.md -> This file
2024-06-17 17:17:01 +02:00
│ ├── config.md -> Configs for training and inference
2024-06-17 16:49:51 +02:00
│ ├── report_01.md -> Report for Open-Sora 1.0
│ ├── report_02.md -> Report for Open-Sora 1.1
│ ├── report_03.md -> Report for Open-Sora 1.2
2025-02-20 09:50:24 +01:00
│ ├── report_04.md -> Report for Open-Sora 1.3
2024-06-17 17:17:01 +02:00
│ ├── vae.md -> our VAE report
2024-04-23 09:41:33 +02:00
│ └── zh_CN -> Chinese version of the above
├── eval -> Evaluation scripts
│ ├── README.md -> Evaluation documentation
2024-06-17 17:17:01 +02:00
| ├── human_eval -> for human eval
2025-02-20 09:50:24 +01:00
| ├── I2V -> for image to video human eval
2024-06-17 17:17:01 +02:00
| ├── loss -> eval loss
| ├── sample.sh -> script for quickly launching inference on predefined prompts
| ├── vae -> for vae eval
2024-04-23 09:41:33 +02:00
| ├── vbench -> for VBench evaluation
│ └── vbench_i2v -> for VBench i2v evaluation
├── gradio -> Gradio demo related code
2024-03-16 08:32:08 +01:00
├── scripts
│ ├── train.py -> diffusion training script
2025-02-20 09:50:24 +01:00
│ ├── train_opensoravae_v1_3.py -> vae v1.3 training script
2024-06-17 17:17:01 +02:00
│ ├── train_vae.py -> vae training script
2024-04-23 09:41:33 +02:00
│ ├── inference.py -> diffusion inference script
2025-02-20 09:50:24 +01:00
│ ├── inference_opensoravae_v1_3.py -> vae v1.3 training script
2024-06-17 17:17:01 +02:00
│ ├── inference_vae.py -> vae inference script
2025-02-20 09:50:24 +01:00
│ ├── inference_i2v.py -> image to video inference script
2024-04-23 09:41:33 +02:00
│ └── misc -> misc scripts, including batch size search
2024-03-16 08:32:08 +01:00
├── opensora
│ ├── __init__ .py
│ ├── registry.py -> Registry helper
│ ├── acceleration -> Acceleration related code
2024-06-17 17:17:01 +02:00
│ ├── datasets -> Dataset related code
2024-03-16 08:32:08 +01:00
│ ├── models
2024-06-17 17:17:01 +02:00
│ │ ├── dit -> DiT
2024-03-16 08:32:08 +01:00
│ │ ├── layers -> Common layers
│ │ ├── vae -> VAE as image encoder
2025-02-20 09:50:24 +01:00
│ │ ├── vae_v1_3 -> VAE V1.3 as image encoder
2024-03-16 08:32:08 +01:00
│ │ ├── text_encoder -> Text encoder
│ │ │ ├── classes.py -> Class id encoder (inference only)
│ │ │ ├── clip.py -> CLIP encoder
│ │ │ └── t5.py -> T5 encoder
│ │ ├── dit
│ │ ├── latte
│ │ ├── pixart
│ │ └── stdit -> Our STDiT related code
2024-03-19 06:11:22 +01:00
│ ├── schedulers -> Diffusion schedulers
2024-03-16 08:32:08 +01:00
│ │ ├── iddpm -> IDDPM for training and inference
│ │ └── dpms -> DPM-Solver for fast inference
│ └── utils
2024-04-23 09:41:33 +02:00
├── tests -> Tests for the project
2024-03-16 08:32:08 +01:00
└── tools -> Tools for data processing and more
```
## Configs
Our config files follows [MMEgine ](https://github.com/open-mmlab/mmengine ). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object.
```plaintext
Open-Sora
2025-02-20 09:50:24 +01:00
└── configs -> Configs for training & inferences
2024-04-23 09:41:33 +02:00
├── opensora-v1-1 -> STDiT2 related configs
│ ├── inference
│ │ ├── sample.py -> Sample videos and images
│ │ └── sample-ref.py -> Sample videos with image/video condition
│ └── train
│ ├── stage1.py -> Stage 1 training config
│ ├── stage2.py -> Stage 2 training config
│ ├── stage3.py -> Stage 3 training config
│ ├── image.py -> Illustration of image training config
│ ├── video.py -> Illustration of video training config
│ └── benchmark.py -> For batch size searching
2024-03-16 08:32:08 +01:00
├── opensora -> STDiT related configs
│ ├── inference
│ │ ├── 16x256x256.py -> Sample videos 16 frames 256x256
│ │ ├── 16x512x512.py -> Sample videos 16 frames 512x512
│ │ └── 64x512x512.py -> Sample videos 64 frames 512x512
│ └── train
│ ├── 16x256x256.py -> Train on videos 16 frames 256x256
│ ├── 16x256x256.py -> Train on videos 16 frames 256x256
│ └── 64x512x512.py -> Train on videos 64 frames 512x512
├── dit -> DiT related configs
│ ├── inference
│ │ ├── 1x256x256-class.py -> Sample images with ckpts from DiT
│ │ ├── 1x256x256.py -> Sample images with clip condition
│ │ └── 16x256x256.py -> Sample videos
│ └── train
│ ├── 1x256x256.py -> Train on images with clip condition
│ └── 16x256x256.py -> Train on videos
├── latte -> Latte related configs
└── pixart -> PixArt related configs
```
2024-04-23 09:41:33 +02:00
## Tools
2024-03-16 10:09:00 +01:00
```plaintext
2024-04-23 09:41:33 +02:00
Open-Sora
└── tools
├── datasets -> dataset management related code
├── scene_cut -> scene cut related code
├── caption -> caption related code
├── scoring -> scoring related code
│ ├── aesthetic -> aesthetic scoring related code
│ ├── matching -> matching scoring related code
│ ├── ocr -> ocr scoring related code
│ └── optical_flow -> optical flow scoring related code
└── frame_interpolation -> frame interpolation related code