# Commands

## Train VAE with 8 machines

```bash
colossalai run --hostfile hostfile --nproc_per_node 8 scripts/train_opensoravae_v1_3.py configs/vae_v1_3/train/video_16z.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT --wandb True > logs/train_opensoravae_v1_3.log 2>&1 &
```

## Evaluate VAE performance

* If ``VID_PATH`` is not specified, use the default uses vid100 used in `eval/loss`;
* We use image1k used in `eval/loss` for image evaluation;
* eval stats are saved to `${CKPT_PATH}/eval/ folder`.

```bash
VID_PATH=/home/shenchenhui/data/eval_loss/forest_vid_100/pixabay_forest_vid_100.csv CUDA_VISIBLE_DEVICES=0 bash eval/vae/launch.sh pretrained_models/OpenSoraVAE_V1_3/model.pt
```

## Inference

We can set optimization options in vae config:
```bash
vae = dict(
    type="OpenSoraVAE_V1_3",
    from_pretrained=None,
    z_channels=16,
    shift=[...],
    scale=[...],
    micro_batch_size=1, # DON'T set during training of vae
    micro_batch_size_2d=4, # DON'T set during training of vae
    micro_frame_size=17, # DON'T set during training of vae
    use_tiled_conv3d=True,
    tile_size=4,
)
```

### Inference with VAE
set force-huggingface to True if loading the original pretrained huggingface model `pretrained_models/OpenSoraVAE_V1_3`.

```bash
# video
CUDA_VISIBLE_DEVICES=0 torchrun --standalone --nproc_per_node=1 scripts/inference_opensoravae_v1_3.py configs/vae_v1_3/inference/video_16z_512x512.py  --data-path YOUR_CSV_PATH --save-dir ./samples/vae_16z/videos --ckpt-path YOUR_PRETRAINED_CKPT --force-huggingface False


# image
CUDA_VISIBLE_DEVICES=0 torchrun --standalone --nproc_per_node=1 scripts/inference_opensoravae_v1_3.py configs/vae_v1_3/inference/image_16z.py  --data-path YOUR_CSV_PATH --save-dir ./samples/vae_16z/images/ --ckpt-path YOUR_PRETRAINED_CKPT --force-huggingface False
```

### Train DiT with freezed VAE

```bash
CUDA_VISIBLE_DEVICES=0 torchrun --standalone --nproc_per_node 1 scripts/train.py configs/opensora-v1-3/train/stage1.py --data-path /mnt/ddn/sora/meta/pro_1_0_ddn/internvid_first_quarter_ext.csv
```

### Inference DiT with VAE

```bash
CUDA_VISIBLE_DEVICES=7 python scripts/inference.py configs/opensora-v1-3/train/stage1.py --ckpt-path /mnt/ddn/sora/checkpoints/outputs/0245-STDiT3-XL-2/epoch0-global_step13000 --prompt-path assets/texts/t2v_samples.txt --save-dir samples/debug --num-frames 51 --resolution 360p --aspect-ratio 9:16 --sample-name sample_2s_360p --batch-size 1
```