Open-Sora/tools/scoring
xyupeng 35989f54d6 Dev/pxy (#24)
* update scoring/matching

* update scoring/matching

* update scoring/matching

* update scoring/matching

* update scoring/matching

* update scoring/matching

* update scoring/matching

* update scoring/matching
2024-04-02 21:01:27 +08:00
..
aesthetic
matching Dev/pxy (#24) 2024-04-02 21:01:27 +08:00
optical_flow
__init__.py
README.md

Data Scoring and Filtering

Aesthetic Scoring

To evaluate the aesthetic quality of videos, we use a pretrained model from CLIP+MLP Aesthetic Score Predictor. This model is trained on 176K SAC (Simulacra Aesthetic Captions) pairs, 15K LAION-Logos (Logos) pairs, and 250K AVA (The Aesthetic Visual Analysis) image-text pairs.

The score is between 1 and 10, where 5.5 can be considered as the threshold for fair aesthetics, and 6.5 for good aesthetics. Good text-to-image models can achieve a score of 7.0 or higher.

For videos, we extract the first, last, and the middle frames for evaluation. The script also supports images. Our script enables 1k videos/s with one GPU. It also supports multiple GPUs to further accelerate the process.

Requirement

# install clip
pip install git+https://github.com/openai/CLIP.git
pip install decord

# get pretrained model
wget https://github.com/christophschuhmann/improved-aesthetic-predictor/raw/main/sac+logos+ava1-l14-linearMSE.pth -O pretrained_models/aesthetic.pth

Usage

With meta.csv containing the paths to the videos, run the following command:

# output: meta_aes.csv
python -m tools.scoring.aesthetic.inference meta.csv

Optical Flow Score

First get the pretrained model.

wget https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-mixdata-train320x576-4e7b215d.pth -P pretrained_models/unimatch

With meta.csv containing the paths to the videos, run the following command:

python -m tools.scoring.optical_flow.inference /path/to/meta.csv
# or run in parallel
torchrun --standalone --nproc_per_node 8 -m tools.scoring.optical_flow.inference_parallel /path/to/meta.csv

The output should be /path/to/meta_flow.csv with column flow. Higher optical flow scores indicate larger movement.

Matching Score

Require column text in meta files, which is the caption of the sample.

TODO.