* update scoring/matching * update scoring/matching * update scoring/matching * update scoring/matching * update scoring/matching * update scoring/matching * update scoring/matching * update scoring/matching |
||
|---|---|---|
| .. | ||
| aesthetic | ||
| matching | ||
| optical_flow | ||
| __init__.py | ||
| README.md | ||
Data Scoring and Filtering
Aesthetic Scoring
To evaluate the aesthetic quality of videos, we use a pretrained model from CLIP+MLP Aesthetic Score Predictor. This model is trained on 176K SAC (Simulacra Aesthetic Captions) pairs, 15K LAION-Logos (Logos) pairs, and 250K AVA (The Aesthetic Visual Analysis) image-text pairs.
The score is between 1 and 10, where 5.5 can be considered as the threshold for fair aesthetics, and 6.5 for good aesthetics. Good text-to-image models can achieve a score of 7.0 or higher.
For videos, we extract the first, last, and the middle frames for evaluation. The script also supports images. Our script enables 1k videos/s with one GPU. It also supports multiple GPUs to further accelerate the process.
Requirement
# install clip
pip install git+https://github.com/openai/CLIP.git
pip install decord
# get pretrained model
wget https://github.com/christophschuhmann/improved-aesthetic-predictor/raw/main/sac+logos+ava1-l14-linearMSE.pth -O pretrained_models/aesthetic.pth
Usage
With meta.csv containing the paths to the videos, run the following command:
# output: meta_aes.csv
python -m tools.scoring.aesthetic.inference meta.csv
Optical Flow Score
First get the pretrained model.
wget https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-mixdata-train320x576-4e7b215d.pth -P pretrained_models/unimatch
With meta.csv containing the paths to the videos, run the following command:
python -m tools.scoring.optical_flow.inference /path/to/meta.csv
# or run in parallel
torchrun --standalone --nproc_per_node 8 -m tools.scoring.optical_flow.inference_parallel /path/to/meta.csv
The output should be /path/to/meta_flow.csv with column flow. Higher optical flow scores indicate larger movement.
Matching Score
Require column text in meta files, which is the caption of the sample.
TODO.