[docs]add docs/commands_zh.md,fix some doc's typo (#100)

Signed-off-by: zeekzen <yangzitao1995@qq.com>
2026-04-11 21:42:26 +02:00 · 2024-03-18 14:30:19 +08:00 · 2024-03-18 14:30:19 +08:00 · 13f8bcfdf0
commit 13f8bcfdf0
parent 08d574d29f
3 changed files with 98 additions and 5 deletions
--- a/docs/README_zh.md
+++ b/docs/README_zh.md
@ -142,7 +142,7 @@ torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/i
 torchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/64x512x512.py --ckpt-path ./path/to/your/ckpt.pth
 ```

-我们在 H800 GPU 上进行了速度测试。如需使用其他模型进行推理，请参阅[此处](docs/commands.md)获取更多说明。
+我们在 H800 GPU 上进行了速度测试。如需使用其他模型进行推理，请参阅[此处](/docs/commands_zh.md)获取更多说明。

 ## 数据处理

@ -169,11 +169,11 @@ torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/6
 colossalai run --nproc_per_node 8 --hostfile hostfile scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --ckpt-path YOUR_PRETRAINED_CKPT
 ```

-有关其他型号的培训和高级使用方法，请参阅[此处](docs/commands.md)获取更多说明。
+有关其他模型的训练和高级使用方法，请参阅[此处](/docs/commands_zh.md)获取更多说明。

 ## 贡献

-如果您希望为该项目做出贡献，可以参考 [贡献指南](./CONTRIBUTING.md).
+如果您希望为该项目做出贡献，可以参考 [贡献指南](/CONTRIBUTING.md).

 ## 声明

--- a/docs/commands.md
+++ b/docs/commands.md
@ -37,7 +37,7 @@ torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inf

 ### Inference with checkpoints saved during training

-During training, an experiment logging folder is created in `outputs` directory. Under each checpoint folder, e.g. `epoch12-global_step2000`, there is a `ema.pt` and the shared `model` folder. Run the following command to perform inference.
+During training, an experiment logging folder is created in `outputs` directory. Under each checkpoint folder, e.g. `epoch12-global_step2000`, there is a `ema.pt` and the shared `model` folder. Run the following command to perform inference.

 ```bash
 # inference with ema model
@ -62,13 +62,14 @@ type="dmp-solver"
 num_sampling_steps=20
 ```

-1. You can use [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)'s finetuned VAE decoder on videos for inference (consumes more memory). However, we do not see significant improvement in the video result. To use it, download [the pretrained weights](https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models/vae_temporal_decoder) into `./pretrained_models/vae_temporal_decoder` and modify the config file as follows.
+2. You can use [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt)'s finetuned VAE decoder on videos for inference (consumes more memory). However, we do not see significant improvement in the video result. To use it, download [the pretrained weights](https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models/vae_temporal_decoder) into `./pretrained_models/vae_temporal_decoder` and modify the config file as follows.

 ```python
 vae = dict(
    type="VideoAutoencoderKLTemporalDecoder",
    from_pretrained="pretrained_models/vae_temporal_decoder",
 )
+```

 ## Training

--- a/docs/commands_zh.md
+++ b/docs/commands_zh.md
@ -0,0 +1,92 @@
+# 命令
+
+## 推理
+
+您可以修改相应的配置文件来更改推理设置。在 [此处](/docs/structure.md#inference-config-demos) 查看更多详细信息。
+
+### 在 ImageNet 上使用 DiT 预训练进行推理
+
+以下命令会自动在 ImageNet 上下载预训练权重并运行推理。
+
+```bash
+python scripts/inference.py configs/dit/inference/1x256x256-class.py --ckpt-path DiT-XL-2-256x256.pt
+```
+
+### 在 UCF101 上使用 Latte 预训练进行推理
+
+以下命令会自动下载 UCF101 上的预训练权重并运行推理。
+
+```bash
+python scripts/inference.py configs/latte/inference/16x256x256-class.py --ckpt-path Latte-XL-2-256x256-ucf101.pt
+```
+
+### 使用 PixArt-α 预训练权重进行推理
+
+将 T5 下载到 `./pretrained_models` 并运行以下命令。
+
+```bash
+# 256x256
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x256x256.py --ckpt-path PixArt-XL-2-256x256.pth
+
+# 512x512
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x512x512.py --ckpt-path PixArt-XL-2-512x512.pth
+
+# 1024 multi-scale
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/pixart/inference/1x1024MS.py --ckpt-path PixArt-XL-2-1024MS.pth
+```
+
+### 使用训练期间保存的 checkpoints 进行推理
+
+在训练期间，会在 `outputs` 目录中创建一个实验日志记录文件夹。在每个 checkpoint 文件夹下（例如 `epoch12-global_step2000`），有一个 `ema.pt` 文件和共享的 `model` 文件夹。执行以下命令进行推理。
+
+```bash
+# 使用 ema 模型进行推理
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000/ema.pt
+
+# 使用模型进行推理
+torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000
+
+# 使用序列并行进行推理
+# 当 nproc_per_node 大于 1 时，将自动启用序列并行
+torchrun --standalone --nproc_per_node 2 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path outputs/001-STDiT-XL-2/epoch12-global_step2000
+```
+
+第二个命令将在 checkpoint 文件夹中自动生成一个 `model_ckpt.pt` 文件。
+
+### 推理超参数
+
+1. DPM 求解器擅长对图像进行快速推理。但是，它的视频推理的效果并不令人满意。若出于快速演示目的您可以使用这个求解器。
+
+```python
+type="dmp-solver"
+num_sampling_steps=20
+```
+
+2. 您可以在视频推理上使用 [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) 微调的 VAE 解码器（消耗更多内存）。但是，我们没有看到视频推理效果有明显改善。要使用它，请将 [预训练权重](https://huggingface.co/maxin-cn/Latte/tree/main/t2v_required_models/vae_temporal_decoder) 下载到 `./pretrained_models/vae_temporal_decoder` 中，并修改配置文件，如下所示。
+
+```python
+vae = dict(
+    type="VideoAutoencoderKLTemporalDecoder",
+    from_pretrained="pretrained_models/vae_temporal_decoder",
+)
+```
+
+## 训练
+
+如果您要继续训练，请运行以下命令。参数 ``--load`` 和 ``--ckpt-path`` 不同之处在于，它会加载优化器和数据加载器的状态。
+
+```bash
+torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --load YOUR_PRETRAINED_CKPT
+```
+
+如果要启用 wandb 日志，请添加到 `--wandb` 参数到命令中。
+
+```bash
+WANDB_API_KEY=YOUR_WANDB_API_KEY torchrun --nnodes=1 --nproc_per_node=8 scripts/train.py configs/opensora/train/64x512x512.py --data-path YOUR_CSV_PATH --wandb True
+```
+
+您可以修改相应的配置文件来更改训练设置。在 [此处](/docs/structure.md#training-config-demos) 查看更多详细信息。
+
+### 训练超参数
+
+1. `dtype` 是用于训练的数据类型。仅支持 `fp16` 和 `bf16`。ColossalAI 自动启用 `fp16` 和 `bf16` 的混合精度训练。在训练过程中，我们发现 `bf16` 更稳定。