Commit graph

464 commits

Author SHA1 Message Date
hxwang
5730060f41
[ckpt] mitigate gpu mem peak when loading ckpt 2025-03-26 18:04:16 +08:00
hxwang
bc4aa4f217 [ckpt] fix shape error when gathering weights under sp + dp parallelism 2025-03-26 15:43:00 +08:00
Alex Gherghina
8202ca13df Import Tuple from typing instead of torch 2025-03-25 10:38:55 +02:00
Zheng Zangwei (Alex Zheng)
febf3ad4b2
Update Open-Sora 2.0 (#807)
* upload v2.0

* update docs

* [hotfix] fit latest fa3 (#802)

* update readme

* update readme

* update readme

* update train readme

* update readme

* update readme: motion score

* cleaning video dc ae WIP

* update config

* add dependency functions

* undo cleaning

* use latest dcae

* complete high compression training

* update hcae config

* cleaned up vae

* update ae.md

* further cleanup

* update vae & ae paths

* align naming of ae

* [hotfix] fix ring attn bwd for fa3 (#803)

* train ae default without wandb

* update config

* update evaluation results

* added hcae report

* update readme

* update readme demo

* update readme demo

* update readme gif

* display demo directly in readme

* update paper

* delete files

---------

Co-authored-by: Hongxin Liu <lhx0217@gmail.com>
Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu>
Co-authored-by: wuxiwen <wuxiwen.simon@gmail.com>
2025-03-12 13:14:22 +08:00
Zheng Zangwei (Alex Zheng)
f1c6b8b88e open-sora v1.3 code upload (#786)
Co-authored-by: gxyes <gxynoz@gmail.com>
2025-02-20 16:50:24 +08:00
Gao, Ruiyuan
df5668cdf1 fix bug at mha, MaskGenerator; improve ckpt_utils.py (#609)
* fix bug at mha in blocks.py

* fix bug in MaskGenerator

* align logging style in ckpt_utils.py
2025-02-20 16:40:47 +08:00
Hongxin Liu
70ca63f30b [feature] support async ckpt & pin memory cache (#760)
* [feature] support async ckpt

* [feature] support pin memory cache

* [doc] update readme
2024-12-20 10:30:49 +08:00
ZXMMD
a29424c237 fix ckpt_utils.py (#580) 2024-07-12 14:52:21 +08:00
Jirka Borovec
a7b6aacc99 lint: unify setting in pyproject.toml (#583)
* lint: unify setting in `pyproject.toml`

* apply pre-commit
2024-07-12 14:50:13 +08:00
Tom Young
194e2204c1 Merge pull request #169 from hpcaitech/hotfix/cut
update default shorter_size
2024-07-04 11:25:07 +08:00
pxy
2ac4900c81 update default shorter_size 2024-07-04 03:14:08 +00:00
zhengzangw
7e325e4e7b Merge branch 'main' of https://github.com/hpcaitech/Open-Sora into main 2024-06-27 14:02:04 +00:00
zhengzangw
eb0ba30484 Merge branch 'main' of github.com:hpcaitech/Open-Sora-dev into main 2024-06-27 07:11:11 +00:00
Hongxin Liu
332d9fc9c9 [feature] make timer optional and make reduce bucket size configurable (#549)
* [feature] make reduce bucket size configurable

* [feature] make timer optional
2024-06-27 13:37:54 +08:00
Zheng Zangwei (Alex Zheng)
45df92849c Merge pull request #156 from hpcaitech/feature/causal_atten
Added causal mask in Attention forward pass
2024-06-26 23:36:37 +08:00
zhengzangw
4b2b47b34d [fix] pixart sampling 2024-06-26 07:00:24 +00:00
FrankLeeeee
3552145f84 [sp] updated precision test 2024-06-25 06:17:36 +00:00
FrankLeeeee
6bb2c599b6 Merge remote-tracking branch 'upstream/main' into hotfix/fix-sp 2024-06-24 09:08:21 +00:00
Jiacheng Yang
00fef1d1af fix SeqParallelMultiHeadCrossAttention for consistent results in distributed mode (#510) 2024-06-24 17:07:49 +08:00
Zheng Zangwei (Alex Zheng)
455f9e7674 Merge pull request #161 from hpcaitech/hotfix/dataset
[data] added error handling to dataset
2024-06-24 16:54:40 +08:00
FrankLeeeee
6a72b8910b [data] added error handling to dataset 2024-06-24 08:53:10 +00:00
zhengzangw
f40ea2270c Merge branch 'main' of github.com:hpcaitech/Open-Sora-dev into main 2024-06-24 07:04:17 +00:00
zhengzangw
491403218d update for pixart 2024-06-24 07:04:08 +00:00
Frank Lee
ee1c79a898 [sp] added padding (#160) 2024-06-24 13:59:29 +08:00
zhengzangw
9a9a6c2f3e [fix] better support local ckpt 2024-06-22 15:54:27 +00:00
zhengzangw
7115864314 [fix] HF loading 2024-06-22 15:41:32 +00:00
zhengzangw
cd12584034 handle av error 2024-06-22 13:26:55 +00:00
zhengzangw
a6bdabe286 minor fix 2024-06-21 19:22:02 +00:00
zhengzangw
7aa940f20d Merge branch 'main' of https://github.com/hpcaitech/Open-Sora into dev/v1.2 2024-06-21 19:17:30 +00:00
zhengzangw
20d1584a1c [fix] support stdit1 training 2024-06-21 19:03:30 +00:00
zhengzangw
dec17bd990 [feat] reduce memory leakage in dataloader and pyav 2024-06-21 18:23:30 +00:00
Zheng Zangwei (Alex Zheng)
9b668e1c4e Merge pull request #523 from BurkeHulk/hotfix/fp16_nan_output
Force fp16 input to fp32 to avoid nan output in timestep_transform
2024-06-21 18:01:17 +08:00
HangXu
04d2ee0182 Force fp16 input to fp32 to avoid nan output in timestep_transform 2024-06-21 11:15:39 +03:00
zhengzangw
f32c1173b7 config for local load 2024-06-20 10:23:38 +00:00
rangoliu
81524e675e fix ar keys (#500) 2024-06-20 17:51:24 +08:00
Shen Chenhui
416837a86b Hotfix/vae (#502)
* fix assert

* fix vae config; update path

---------

Co-authored-by: Shen-Chenhui <shen_chenhui@u.nus.edu>
2024-06-20 17:49:16 +08:00
HangXu
8f239c87bf Added causal mask in Attention forward pass 2024-06-20 11:48:42 +03:00
Zheng Zangwei (Alex Zheng)
4cbf3c33b8 Hotfix/t5 load (#487)
* hotfix

* hotfix for stdit

* hotfix for vae
2024-06-19 23:15:29 +08:00
Zheng Zangwei (Alex Zheng)
396307c050 Hotfix/t5 load (#486)
* hotfix

* hotfix for stdit
2024-06-19 23:03:15 +08:00
Zheng Zangwei (Alex Zheng)
85f20274a0 hotfix (#484) 2024-06-19 22:47:40 +08:00
Zheng Zangwei (Alex Zheng)
ccb85fc3c3 hotfix (#482) 2024-06-19 22:05:47 +08:00
Shen Chenhui
49536bd923 Merge pull request #154 from hpcaitech/hotfix/t5_assert
fix assertion
2024-06-19 21:22:51 +08:00
Shen-Chenhui
1602768d5b fix assertion 2024-06-19 13:22:04 +00:00
Zheng Zangwei (Alex Zheng)
403772eee1 Docs/fix zangwei (#474)
* [docs] fix training data num

* [docs] update sp

* add support for issue #470
2024-06-19 16:53:53 +08:00
Jianshu Guo
17cce908b2 fix:multi-fps bug. For multi-fps training, when extracting frames according to a certain frame_interval, the fps of the extracted frames actually changes. (#444)
Co-authored-by: Jianshu Guo <fguojianshu@gmail.com>
2024-06-18 16:20:11 +08:00
jackmappotion
2229f01b35 ♻️ Directory creation to use os.makedirs with exist_ok (#435)
Simplified code and improved readability / Ensured functionality remains the same by allowing directory to exist without error
2024-06-18 07:41:02 +08:00
FrankLeeeee
7ba29d3439 [doc] resolved conflict in readme 2024-06-17 23:18:40 +00:00
zhengzangw
525e29abbe reformat and update docs 2024-06-17 15:37:23 +00:00
zhengzangw
30e276166c update 2024-06-17 13:42:27 +00:00
Shen Chenhui
f0c98dd186 Merge pull request #148 from hpcaitech/feature/docs_v1.2
Feature/docs v1.2
2024-06-17 17:18:27 +08:00