# Data Scoring and Filtering

- [Data Scoring and Filtering](#data-scoring-and-filtering)
  - [Aesthetic Scoring](#aesthetic-scoring)
    - [Requirement](#requirement)
    - [Usage](#usage)
  - [Optical Flow Score](#optical-flow-score)
  - [Matching Score](#matching-score)

## Aesthetic Scoring

To evaluate the aesthetic quality of videos, we use a pretrained model from [CLIP+MLP Aesthetic Score Predictor](https://github.com/christophschuhmann/improved-aesthetic-predictor). This model is trained on 176K SAC (Simulacra Aesthetic Captions) pairs, 15K LAION-Logos (Logos) pairs, and 250K AVA (The Aesthetic Visual Analysis) image-text pairs.

The score is between 1 and 10, where 5.5 can be considered as the threshold for fair aesthetics, and 6.5 for good aesthetics. Good text-to-image models can achieve a score of 7.0 or higher.

For videos, we extract the first, last, and the middle frames for evaluation. The script also supports images. Our script enables 1k videos/s with one GPU. It also supports multiple GPUs to further accelerate the process.

### Requirement

```bash
# install clip
pip install git+https://github.com/openai/CLIP.git
pip install decord

# get pretrained model
wget https://github.com/christophschuhmann/improved-aesthetic-predictor/raw/main/sac+logos+ava1-l14-linearMSE.pth -O pretrained_models/aesthetic.pth
```

### Usage

With `meta.csv` containing the paths to the videos, run the following command:

```bash
# output: meta_aes.csv
torchrun --nproc_per_node 8  -m tools.scoring.aesthetic.inference meta.csv --bs 1024 --num_workers 16
```

This will generate multiple part files, you can use `python -m tools.datasets.csvutil DATA1.csv DATA2.csv` to merge these part files.

## Optical Flow Score

Optical flow scores are used to assess the motion of a video. Higher optical flow scores indicate larger movement.
TODO: acknowledge UniMatch.

First get the pretrained model.

```bash
wget https://s3.eu-central-1.amazonaws.com/avg-projects/unimatch/pretrained/gmflow-scale2-regrefine6-mixdata-train320x576-4e7b215d.pth -P pretrained_models/unimatch
```

Then run:

```bash
torchrun --standalone --nproc_per_node 8 tools/scoring/optical_flow/inference.py /path/to/meta.csv
```

The output should be `/path/to/meta_flow.csv` with column `flow`.

## Matching Score

Matching scores are calculated to evaluate the alignment between an image/video and its caption.
For videos, we compute the matching score of the middle frame and the caption.

**Make sure** meta files contain the column `text`, which is the caption of the sample. Then run:

```bash
torchrun --standalone --nproc_per_node 8 tools/scoring/matching/inference.py /path/to/meta.csv
```

The output should be `/path/to/meta_match.csv` with column `match`. Higher matching scores indicate better image-text/video-text alignment.