Open-Sora/docs/structure.md

125 lines
7 KiB
Markdown
Raw Normal View History

2024-04-23 09:41:33 +02:00
# Repo Structure
2024-03-16 08:32:08 +01:00
```plaintext
Open-Sora
├── README.md
2024-04-23 09:41:33 +02:00
├── assets
│ ├── images -> images used for image-conditioned generation
2024-06-17 17:17:01 +02:00
│ ├── demo -> images used for demo
2024-04-23 09:41:33 +02:00
│ ├── texts -> prompts used for text-conditioned generation
│ └── readme -> images used in README
├── configs -> Configs for training & inference
2024-06-17 17:17:01 +02:00
├── docker -> dockerfile for Open-Sora
2024-03-16 08:32:08 +01:00
├── docs
2024-04-23 09:41:33 +02:00
│ ├── acceleration.md -> Report on acceleration & speed benchmark
2024-06-17 17:17:01 +02:00
│ ├── commands.md -> Commands for training & inference
2024-03-16 08:32:08 +01:00
│ ├── datasets.md -> Datasets used in this project
2024-06-17 17:17:01 +02:00
| ├── data_processing.md -> Data pipeline documents
| ├── installation.md -> Data pipeline documents
2024-03-16 08:32:08 +01:00
│ ├── structure.md -> This file
2024-06-17 17:17:01 +02:00
│ ├── config.md -> Configs for training and inference
2024-06-17 16:49:51 +02:00
│ ├── report_01.md -> Report for Open-Sora 1.0
│ ├── report_02.md -> Report for Open-Sora 1.1
│ ├── report_03.md -> Report for Open-Sora 1.2
│ ├── report_04.md -> Report for Open-Sora 1.3
2024-06-17 17:17:01 +02:00
│ ├── vae.md -> our VAE report
2024-04-23 09:41:33 +02:00
│ └── zh_CN -> Chinese version of the above
├── eval -> Evaluation scripts
│ ├── README.md -> Evaluation documentation
2024-06-17 17:17:01 +02:00
| ├── human_eval -> for human eval
| ├── I2V -> for image to video human eval
2024-06-17 17:17:01 +02:00
| ├── loss -> eval loss
| ├── sample.sh -> script for quickly launching inference on predefined prompts
| ├── vae -> for vae eval
2024-04-23 09:41:33 +02:00
| ├── vbench -> for VBench evaluation
│ └── vbench_i2v -> for VBench i2v evaluation
├── gradio -> Gradio demo related code
2024-03-16 08:32:08 +01:00
├── scripts
│ ├── train.py -> diffusion training script
│ ├── train_opensoravae_v1_3.py -> vae v1.3 training script
2024-06-17 17:17:01 +02:00
│ ├── train_vae.py -> vae training script
2024-04-23 09:41:33 +02:00
│ ├── inference.py -> diffusion inference script
│ ├── inference_opensoravae_v1_3.py -> vae v1.3 training script
2024-06-17 17:17:01 +02:00
│ ├── inference_vae.py -> vae inference script
│ ├── inference_i2v.py -> image to video inference script
2024-04-23 09:41:33 +02:00
│ └── misc -> misc scripts, including batch size search
2024-03-16 08:32:08 +01:00
├── opensora
│ ├── __init__.py
│ ├── registry.py -> Registry helper
│   ├── acceleration -> Acceleration related code
2024-06-17 17:17:01 +02:00
│   ├── datasets -> Dataset related code
2024-03-16 08:32:08 +01:00
│   ├── models
2024-06-17 17:17:01 +02:00
│   │   ├── dit -> DiT
2024-03-16 08:32:08 +01:00
│   │   ├── layers -> Common layers
│   │   ├── vae -> VAE as image encoder
│   │   ├── vae_v1_3 -> VAE V1.3 as image encoder
2024-03-16 08:32:08 +01:00
│   │   ├── text_encoder -> Text encoder
│   │   │   ├── classes.py -> Class id encoder (inference only)
│   │   │   ├── clip.py -> CLIP encoder
│   │   │   └── t5.py -> T5 encoder
│   │   ├── dit
│   │   ├── latte
│   │   ├── pixart
│   │   └── stdit -> Our STDiT related code
│   ├── schedulers -> Diffusion schedulers
2024-03-16 08:32:08 +01:00
│   │   ├── iddpm -> IDDPM for training and inference
│   │ └── dpms -> DPM-Solver for fast inference
│ └── utils
2024-04-23 09:41:33 +02:00
├── tests -> Tests for the project
2024-03-16 08:32:08 +01:00
└── tools -> Tools for data processing and more
```
## Configs
Our config files follows [MMEgine](https://github.com/open-mmlab/mmengine). MMEngine will reads the config file (a `.py` file) and parse it into a dictionary-like object.
```plaintext
Open-Sora
└── configs -> Configs for training & inferences
2024-04-23 09:41:33 +02:00
├── opensora-v1-1 -> STDiT2 related configs
│ ├── inference
│ │ ├── sample.py -> Sample videos and images
│ │ └── sample-ref.py -> Sample videos with image/video condition
│ └── train
│ ├── stage1.py -> Stage 1 training config
│ ├── stage2.py -> Stage 2 training config
│ ├── stage3.py -> Stage 3 training config
│ ├── image.py -> Illustration of image training config
│ ├── video.py -> Illustration of video training config
│ └── benchmark.py -> For batch size searching
2024-03-16 08:32:08 +01:00
├── opensora -> STDiT related configs
│ ├── inference
│ │ ├── 16x256x256.py -> Sample videos 16 frames 256x256
│ │ ├── 16x512x512.py -> Sample videos 16 frames 512x512
│ │ └── 64x512x512.py -> Sample videos 64 frames 512x512
│ └── train
│ ├── 16x256x256.py -> Train on videos 16 frames 256x256
│ ├── 16x256x256.py -> Train on videos 16 frames 256x256
│ └── 64x512x512.py -> Train on videos 64 frames 512x512
├── dit -> DiT related configs
   │   ├── inference
   │   │   ├── 1x256x256-class.py -> Sample images with ckpts from DiT
   │   │   ├── 1x256x256.py -> Sample images with clip condition
   │   │   └── 16x256x256.py -> Sample videos
   │   └── train
   │     ├── 1x256x256.py -> Train on images with clip condition
   │      └── 16x256x256.py -> Train on videos
├── latte -> Latte related configs
└── pixart -> PixArt related configs
```
2024-04-23 09:41:33 +02:00
## Tools
```plaintext
2024-04-23 09:41:33 +02:00
Open-Sora
└── tools
├── datasets -> dataset management related code
├── scene_cut -> scene cut related code
├── caption -> caption related code
├── scoring -> scoring related code
│ ├── aesthetic -> aesthetic scoring related code
│ ├── matching -> matching scoring related code
│ ├── ocr -> ocr scoring related code
│ └── optical_flow -> optical flow scoring related code
└── frame_interpolation -> frame interpolation related code