From 9e8d64a99115e3d58fad45875d6ac033b6123f4c Mon Sep 17 00:00:00 2001 From: "Zheng Zangwei (Alex Zheng)" Date: Mon, 18 Mar 2024 01:35:24 +0800 Subject: [PATCH] Docs/readme (#87) * update docs * update docs * update docs * update acceleration docs and fix typos * update docs commands * update zh readme * update badges --- README.md | 17 +++++++++-------- docs/README_zh.md | 17 +++++++++-------- 2 files changed, 18 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index 5904201..ffa0a31 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@

- -

- +

+
+ +
@@ -31,7 +32,7 @@ inference, and more. Our provided [checkpoints](#model-weights) can produce 2~5s | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16) | | A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. | A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff. | The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall. | -| [](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65) | +| [](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9) | [](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65) | | A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...] | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...] | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...] | Videos are downsampled to `.gif` for display. Click for original videos. Prompts are trimmed for display, see [here](/assets/texts/t2v_samples.txt) for full prompts. See more samples at our [gallery](https://hpcaitech.github.io/Open-Sora/). @@ -114,12 +115,12 @@ After installation, we suggest reading [structure.md](docs/structure.md) to lear ## Model Weights -| Resoluion | Data | #iterations | Batch Size | GPU days (H800) | URL | -| ---------- | ------ | ----------- | ---------- | --------------- | ---------- | -| 16×256×256 | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) | +| Resoluion | Data | #iterations | Batch Size | GPU days (H800) | URL | +| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- | +| 16×256×256 | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) | | 16×256×256 | 20K HQ | 24k | 8×64 | 45 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) | | 16×512×512 | 20K HQ | 20k | 2×64 | 35 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) | -| 64×512×512 | 50K HQ | | | | TBD | +| 64×512×512 | 50K HQ | | | | TBD | Our model's weight is partially initialized from [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha). The number of parameters is 724M. More information about training can be found in our **[report](/docs/report_v1.md)**. More about dataset can be found in [dataset.md](/docs/dataset.md). HQ means high quality. diff --git a/docs/README_zh.md b/docs/README_zh.md index fde0787..4855da2 100644 --- a/docs/README_zh.md +++ b/docs/README_zh.md @@ -1,11 +1,13 @@

- +

-

- + + + +
@@ -13,12 +15,12 @@ **Open-Sora**项目是一项致力于**高效**制作高质量视频,并使所有人都能使用其模型、工具和内容的计划。 通过采用**开源**原则,Open-Sora 不仅实现了先进视频生成技术的低成本普及,还提供了一个精简且用户友好的方案,简化了视频制作的复杂性。 通过 Open-Sora,我们希望更多开发者一起探索内容创作领域的创新、创造和包容。 - [[English]](https://github.com/hpcaitech/Open-Sora/blob/main/README.md) + [[English]](/README.md) ## 📰 资讯 * **[2024.03.18]** 🔥 我们发布了**Open-Sora 1.0**,这是一个完全开源的视频生成项目。 -* Open-Sora 1.0 支持视频数据预处理、 加速训练、推理等全套流程。 +* Open-Sora 1.0 支持视频数据预处理、 加速训练、推理等全套流程。 * 我们提供的[模型权重](#model-weights)只需 3 天的训练就能生成 2~5 秒的 512x512 视频。 * **[2024.03.04]** Open-Sora:开源Sora复现方案,成本降低46%,序列扩充至近百万 @@ -42,7 +44,7 @@ * ✅ 我们发现来自[VideoGPT](https://wilson1yan.github.io/videogpt/index.html)的 VQ-VAE 质量较低,因此采用了来自[Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original) 的更好的 VAE。我们还发现在时间维度上进行修补会降低质量。更多讨论,请参阅我们的 **[报告](docs/report_v1.md)**。 * ✅ 我们研究了不同的架构,包括 DiT、Latte 和我们提出的 **STDiT**。我们的STDiT在质量和速度之间实现了更好的权衡。更多讨论,请参阅我们的 **[报告](docs/report_v1.md)**。 * ✅ 支持剪辑和 T5 文本调节。 -* ✅ 通过将图像视为单帧视频,我们的项目支持在图像和视频(如 ImageNet 和 UCF101)上训练 DiT。更多说明请参见 [评论](docs/command.md) +* ✅ 通过将图像视为单帧视频,我们的项目支持在图像和视频(如 ImageNet 和 UCF101)上训练 DiT。更多说明请参见 [指令解析](docs/command.md)。 * ✅ 利用[DiT](https://github.com/facebookresearch/DiT)、[Latte](https://github.com/Vchitect/Latte) 和 [PixArt](https://pixart-alpha.github.io/) 的官方权重支持推理。
@@ -54,8 +56,7 @@ ### 下一步计划【按优先级排序】 -* [ ] 完成数据处理管道(包括密集光流、美学评分、文本图像相似性、重复数据删除等)。更多信息请参见[数据集](/docs/datasets.md)。**[WIP]** -* [ ] Complete the data processing pipeline (including dense optical flow, aesthetics scores, text-image similarity, deduplication, etc.). See [datasets.md](/docs/datasets.md) for more information. **[项目进行中]** +* [ ] 完成数据处理管道(包括密集光流、美学评分、文本图像相似性、重复数据删除等)。更多信息请参见[数据集](/docs/datasets.md)。**[项目进行中]** * [ ] 训练视频-VAE。 **[项目进行中]**