diff --git a/README.md b/README.md index 5904201..ffa0a31 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@
-
-
-
+
](https://github.com/hpcaitech/Open-Sora/assets/99191637/de1963d3-b43b-4e68-a670-bb821ebb6f80) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/13f8338f-3d42-4b71-8142-d234fbd746cc) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/fa6a65a6-e32a-4d64-9a9e-eabb0ebb8c16) |
| A serene night scene in a forested area. [...] The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. | A soaring drone footage captures the majestic beauty of a coastal cliff, [...] The water gently laps at the rock base and the greenery that clings to the top of the cliff. | The majestic beauty of a waterfall cascading down a cliff into a serene lake. [...] The camera angle provides a bird's eye view of the waterfall. |
-| [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65) |
+| [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/64232f84-1b36-4750-a6c0-3e610fa9aa94) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/983a1965-a374-41a7-a76b-c07941a6c1e9) | [
](https://github.com/hpcaitech/Open-Sora/assets/99191637/ec10c879-9767-4c31-865f-2e8d6cf11e65) |
| A bustling city street at night, filled with the glow of car headlights and the ambient light of streetlights. [...] | The vibrant beauty of a sunflower field. The sunflowers are arranged in neat rows, creating a sense of order and symmetry. [...] | A serene underwater scene featuring a sea turtle swimming through a coral reef. The turtle, with its greenish-brown shell [...] |
Videos are downsampled to `.gif` for display. Click for original videos. Prompts are trimmed for display, see [here](/assets/texts/t2v_samples.txt) for full prompts. See more samples at our [gallery](https://hpcaitech.github.io/Open-Sora/).
@@ -114,12 +115,12 @@ After installation, we suggest reading [structure.md](docs/structure.md) to lear
## Model Weights
-| Resoluion | Data | #iterations | Batch Size | GPU days (H800) | URL |
-| ---------- | ------ | ----------- | ---------- | --------------- | ---------- |
-| 16×256×256 | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) |
+| Resoluion | Data | #iterations | Batch Size | GPU days (H800) | URL |
+| ---------- | ------ | ----------- | ---------- | --------------- | --------------------------------------------------------------------------------------------- |
+| 16×256×256 | 366K | 80k | 8×64 | 117 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-16x256x256.pth) |
| 16×256×256 | 20K HQ | 24k | 8×64 | 45 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x256x256.pth) |
| 16×512×512 | 20K HQ | 20k | 2×64 | 35 | [:link:](https://huggingface.co/hpcai-tech/Open-Sora/blob/main/OpenSora-v1-HQ-16x512x512.pth) |
-| 64×512×512 | 50K HQ | | | | TBD |
+| 64×512×512 | 50K HQ | | | | TBD |
Our model's weight is partially initialized from [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha). The number of parameters is 724M. More information about training can be found in our **[report](/docs/report_v1.md)**. More about dataset can be found in [dataset.md](/docs/dataset.md). HQ means high quality.
diff --git a/docs/README_zh.md b/docs/README_zh.md
index fde0787..4855da2 100644
--- a/docs/README_zh.md
+++ b/docs/README_zh.md
@@ -1,11 +1,13 @@
-
+
-
@@ -13,12 +15,12 @@ **Open-Sora**项目是一项致力于**高效**制作高质量视频,并使所有人都能使用其模型、工具和内容的计划。 通过采用**开源**原则,Open-Sora 不仅实现了先进视频生成技术的低成本普及,还提供了一个精简且用户友好的方案,简化了视频制作的复杂性。 通过 Open-Sora,我们希望更多开发者一起探索内容创作领域的创新、创造和包容。 - [[English]](https://github.com/hpcaitech/Open-Sora/blob/main/README.md) + [[English]](/README.md) ## 📰 资讯 * **[2024.03.18]** 🔥 我们发布了**Open-Sora 1.0**,这是一个完全开源的视频生成项目。 -* Open-Sora 1.0 支持视频数据预处理、
加速训练、推理等全套流程。
+* Open-Sora 1.0 支持视频数据预处理、
加速训练、推理等全套流程。
* 我们提供的[模型权重](#model-weights)只需 3 天的训练就能生成 2~5 秒的 512x512 视频。
* **[2024.03.04]** Open-Sora:开源Sora复现方案,成本降低46%,序列扩充至近百万
@@ -42,7 +44,7 @@
* ✅ 我们发现来自[VideoGPT](https://wilson1yan.github.io/videogpt/index.html)的 VQ-VAE 质量较低,因此采用了来自[Stability-AI](https://huggingface.co/stabilityai/sd-vae-ft-mse-original) 的更好的 VAE。我们还发现在时间维度上进行修补会降低质量。更多讨论,请参阅我们的 **[报告](docs/report_v1.md)**。
* ✅ 我们研究了不同的架构,包括 DiT、Latte 和我们提出的 **STDiT**。我们的STDiT在质量和速度之间实现了更好的权衡。更多讨论,请参阅我们的 **[报告](docs/report_v1.md)**。
* ✅ 支持剪辑和 T5 文本调节。
-* ✅ 通过将图像视为单帧视频,我们的项目支持在图像和视频(如 ImageNet 和 UCF101)上训练 DiT。更多说明请参见 [评论](docs/command.md)
+* ✅ 通过将图像视为单帧视频,我们的项目支持在图像和视频(如 ImageNet 和 UCF101)上训练 DiT。更多说明请参见 [指令解析](docs/command.md)。
* ✅ 利用[DiT](https://github.com/facebookresearch/DiT)、[Latte](https://github.com/Vchitect/Latte) 和 [PixArt](https://pixart-alpha.github.io/) 的官方权重支持推理。