mirror of https://github.com/hpcaitech/Open-Sora.git synced 2026-05-21 03:33:55 +02:00

History

Zheng Zangwei (Alex Zheng) f1ee27ba2f [feat] llava support image and text (#13 ) * [feat] llava support image and text * add resize for image * update gpt4 caption * update prompt for llava image captioning		2024-03-31 20:59:33 +08:00
..
__init__.py	[feat] add aesthetic score	2024-03-24 20:34:41 +08:00
inference.py	[feat] llava support image and text (#13 )	2024-03-31 20:59:33 +08:00
README.md	Update image process (#5 )	2024-03-29 23:34:10 +08:00

README.md

Aesthetic Scoring

To evaluate the aesthetic quality of videos, we use a pretrained model from CLIP+MLP Aesthetic Score Predictor. This model is trained on 176K SAC (Simulacra Aesthetic Captions) pairs, 15K LAION-Logos (Logos) pairs, and 250K AVA (The Aesthetic Visual Analysis) image-text pairs.

The score is between 1 and 10, where 5.5 can be considered as the threshold for fair aesthetics, and 6.5 for good aesthetics. Good text-to-image models can achieve a score of 7.0 or higher.

For videos, we extract the first, last, and the middle frames for evaluation. The script also supports images. Our script enables 1k videos/s with one GPU. It also supports multiple GPUs to further accelerate the process.

Requirement

# install clip
pip install git+https://github.com/openai/CLIP.git

# get pretrained model
wget https://github.com/christophschuhmann/improved-aesthetic-predictor/raw/main/sac+logos+ava1-l14-linearMSE.pth -O pretrained_models/aesthetic.pth

Usage

With DATA.csv containing the paths to the videos, run the following command:

# output: DATA_aes.csv
python -m tools.aesthetic.inference DATA.csv