Merge branch 'dev/v1.1' of github.com:hpcaitech/Open-Sora-dev into dev/v1.1

This commit is contained in:
zhengzangw 2024-04-23 07:41:57 +00:00
commit e763669eab
8 changed files with 120 additions and 389 deletions

View file

@ -248,7 +248,7 @@ is [here](/docs/datasets.md). We provide tools to process video data. Our data p
the following steps:
1. Manage datasets. [[docs](/tools/datasets/README.md)]
2. Scene detection and video splitting. [[docs](/tools/splitting/README.md)]
2. Scene detection and video splitting. [[docs](/tools/scene_cut/README.md)]
3. Score and filter videos. [[docs](/tools/scoring/README.md)]
4. Generate video captions. [[docs](/tools/caption/README.md)]

View file

@ -1,15 +1,24 @@
# Datasets
For Open-Sora 1.1, we conduct mixed training with both images and videos. The main datasets we use are listed below.
Please refer to [README](/README.md#data-processing) for data processing.
## Panda-70M
[Panda-70M](https://github.com/snap-research/Panda-70M) is a large-scale dataset with 70M video-caption pairs.
We use the [training-10M subset](https://github.com/snap-research/Panda-70M/tree/main/dataset_dataloading) for training,
which contains ~10M videos of better quality.
## Pexels
[Pexels](https://www.pexels.com/) is a popular online platform that provides high-quality stock photos, videos, and music for free.
Most videos from this website are of high quality. Thus, we use them for both pre-training and HQ fine-tuning.
We really appreciate the great platform and the contributors!
## Inter4K
[Inter4K](https://github.com/alexandrosstergiou/Inter4K) is a dataset containing 1K video clips with 4K resolution.
The dataset is proposed for super-resolution tasks. We use the dataset for HQ fine-tuning.
## HD-VG-130M
[HD-VG-130M](https://github.com/daooshee/HD-VG-130M?tab=readme-ov-file) comprises 130M text-video pairs. The caption is generated by BLIP-2. We find the cut and the text quality are relatively poor. It contains 20 splits. For OpenSora 1.0, we use the first split (~350K). We plan to use the whole dataset and re-process it.
You can download the dataset and prepare it for training according to [the dataset repository's instructions](https://github.com/daooshee/HD-VG-130M). There is a README.md file in the Google Drive link that provides instructions on how to download and cut the videos. For this version, we directly use the dataset provided by the authors.
## Inter4k
[Inter4k](https://github.com/alexandrosstergiou/Inter4K) is a dataset containing 1k video clips with 4K resolution. The dataset is proposed for super-resolution tasks. We use the dataset for HQ training. The videos are processed as mentioned [here](/README.md#data-processing).
## Pexels.com
[Pexels.com](https://www.pexels.com/) is a website that provides free stock photos and videos. We collect 19K video clips from this website for HQ training. The videos are processed as mentioned [here](/README.md#data-processing).
[HD-VG-130M](https://github.com/daooshee/HD-VG-130M?tab=readme-ov-file) comprises 130M text-video pairs.
The caption is generated by BLIP-2.
We find the scene and the text quality are relatively poor. For OpenSora 1.0, we only use ~350K samples from this dataset.

View file

@ -2,7 +2,7 @@
In many cases, raw videos contain several scenes and are too long for training. Thus, it is essential to split them into shorter
clips based on scenes. Here, we provide code for scene detection and video splitting.
## Formatting
## Prepare a meta file
At this step, you should have a raw video dataset prepared. We need a meta file for the dataset. To create a meta file from a folder, run:
```bash
@ -15,7 +15,7 @@ If you already have a meta file for the videos and want to keep the information.
The following command will add a new column `path` to the meta file.
```bash
python tools/scene_cut/process_meta.py --task append_path --meta_path /path/to/meta.csv --folder_path /path/to/video/folder
python tools/scene_cut/convert_id_to_path.py /path/to/meta.csv --folder_path /path/to/video/folder
```
This should output
- `{prefix}_path-filtered.csv` with column `path` (broken videos filtered)
@ -28,8 +28,7 @@ We use [`PySceneDetect`](https://github.com/Breakthrough/PySceneDetect) for this
**Make sure** the input meta file has column `path`, which is the path of a video.
```bash
python tools/scene_cut/scene_detect.py --meta_path /path/to/meta.csv
python tools/scene_cut/scene_detect.py --meta_path /mnt/hdd/data/pexels_new/raw/meta/popular_6_format.csv
python tools/scene_cut/scene_detect.py /path/to/meta.csv
```
The output is `{prefix}_timestamp.csv` with column `timestamp`. Each cell in column `timestamp` is a list of tuples,
with each tuple indicating the start and end timestamp of a scene
@ -39,18 +38,13 @@ with each tuple indicating the start and end timestamp of a scene
After obtaining timestamps for scenes, we conduct video splitting (cutting).
**Make sure** the meta file contains column `timestamp`.
TODO: output video size, min_duration, max_duration
```bash
python tools/scene_cut/main_cut_pandarallel.py \
--meta_path /path/to/meta.csv \
--out_dir /path/to/output/dir
python tools/scene_cut/main_cut_pandarallel.py \
--meta_path /mnt/hdd/data/pexels_new/raw/meta/popular_6_format_timestamp.csv \
--out_dir /mnt/hdd/data/pexels_new/scene_cut/data/popular_6
python tools/scene_cut/cut.py /path/to/meta.csv --save_dir /path/to/output/dir
```
This yields video clips saved in `/path/to/output/dir`. The video clips are named as `{video_id}_scene-{scene_id}.mp4`
This will save video clips to `/path/to/output/dir`. The video clips are named as `{video_id}_scene-{scene_id}.mp4`
TODO: meta for video clips
To create a new meta file for the generated clips, run:
```bash
python -m tools.datasets.convert video /path/to/video/folder --output /path/to/save/meta.csv
```

View file

@ -1,15 +1,5 @@
"""
1. format_raw_meta()
- only keep intact videos
- add 'path' column (abs path)
2. create_meta_for_folder()
"""
import os
# os.chdir('../..')
print(f"Current working directory: {os.getcwd()}")
import argparse
import json
from functools import partial
@ -18,7 +8,42 @@ import numpy as np
import pandas as pd
from pandarallel import pandarallel
from tqdm import tqdm
from utils_video import is_intact_video
import cv2
from mmengine.logging import print_log
from moviepy.editor import VideoFileClip
def is_intact_video(video_path, mode="moviepy", verbose=False, logger=None):
if not os.path.exists(video_path):
if verbose:
print_log(f"Could not find '{video_path}'", logger=logger)
return False
if mode == "moviepy":
try:
VideoFileClip(video_path)
if verbose:
print_log(f"The video file '{video_path}' is intact.", logger=logger)
return True
except Exception as e:
if verbose:
print_log(f"Error: {e}", logger=logger)
print_log(f"The video file '{video_path}' is not intact.", logger=logger)
return False
elif mode == "cv2":
try:
cap = cv2.VideoCapture(video_path)
if cap.isOpened():
if verbose:
print_log(f"The video file '{video_path}' is intact.", logger=logger)
return True
except Exception as e:
if verbose:
print_log(f"Error: {e}", logger=logger)
print_log(f"The video file '{video_path}' is not intact.", logger=logger)
return False
else:
raise ValueError
def has_downloaded_success(json_path):
@ -36,12 +61,28 @@ def has_downloaded_success(json_path):
return True
def append_format_pandarallel(meta_path, folder_path, mode=".json"):
def is_intact(row, mode=".json"):
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("meta_path", type=str)
parser.add_argument("--folder_path", type=str, required=True)
parser.add_argument("--mode", type=str, default=None)
args = parser.parse_args()
return args
def main():
args = parse_args()
meta_path = args.meta_path
folder_path = args.folder_path
mode = args.mode
def is_intact(row, mode=None):
video_id = row["id"]
# video_path = os.path.join(root_raw, f"data/{split}/{video_id}.mp4")
video_path = os.path.join(folder_path, f"{video_id}.mp4")
row["path"] = video_path
if mode == ".mp4":
if is_intact_video(video_path):
return True, video_path
@ -74,7 +115,6 @@ def append_format_pandarallel(meta_path, folder_path, mode=".json"):
meta.to_csv(out_path, index=False)
print(f"New meta (shape={meta.shape}) with intact info saved to '{out_path}'")
# meta_format = meta[meta['intact']]
meta_format = meta[np.array(intact)]
meta_format.drop("intact", axis=1, inplace=True)
out_path = os.path.join(meta_dirpath, f"{wo_ext}_path-filtered.csv")
@ -82,40 +122,5 @@ def append_format_pandarallel(meta_path, folder_path, mode=".json"):
print(f"New meta (shape={meta_format.shape}) with format info saved to '{out_path}'")
def create_subset(meta_path):
meta = pd.read_csv(meta_path)
meta_subset = meta.iloc[:100]
wo_ext, ext = os.path.splitext(meta_path)
out_path = f"{wo_ext}_head-100{ext}"
meta_subset.to_csv(out_path, index=False)
print(f"New meta (shape={meta_subset.shape}) saved to '{out_path}'")
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--task", default="append_path", required=True)
parser.add_argument("--meta_path", type=str, required=True)
parser.add_argument("--folder_path", type=str, required=True)
parser.add_argument("--mode", type=str, default=None)
parser.add_argument("--num_workers", default=5, type=int)
args = parser.parse_args()
return args
def main():
args = parse_args()
meta_path = args.meta_path
task = args.task
if task == "append_path":
append_format_pandarallel(meta_path=meta_path, folder_path=args.folder_path, mode=args.mode)
elif task == "create_subset":
create_subset(meta_path=meta_path)
else:
raise ValueError
if __name__ == "__main__":
main()

View file

@ -11,7 +11,7 @@ from pandarallel import pandarallel
from scenedetect import FrameTimecode
def process_single_row(row, save_dir, log_name=None):
def process_single_row(row, args, log_name=None):
video_path = row["path"]
logger = None
@ -28,7 +28,14 @@ def process_single_row(row, save_dir, log_name=None):
scene_list = eval(timestamp)
scene_list = [(FrameTimecode(s, fps=1), FrameTimecode(t, fps=1)) for s, t in scene_list]
split_video(
video_path, scene_list, save_dir=save_dir, min_seconds=2, max_seconds=15, shorter_size=720, logger=logger
video_path,
scene_list,
save_dir=args.save_dir,
min_seconds=args.min_seconds,
max_seconds=args.max_seconds,
target_fps=args.target_fps,
shorter_size=args.shorter_size,
logger=logger,
)
@ -36,10 +43,10 @@ def split_video(
video_path,
scene_list,
save_dir,
min_seconds=None,
max_seconds=None,
min_seconds=2.0,
max_seconds=15.0,
target_fps=30,
shorter_size=512,
shorter_size=720,
verbose=False,
logger=None,
):
@ -121,9 +128,14 @@ def split_video(
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--meta_path", default="./data/pexels_new/raw/meta/popular_5_format_timestamp.csv")
parser.add_argument("--out_dir", default="./data/pexels_new/scene_cut/data/popular_5")
parser.add_argument("--num_workers", default=5, type=int)
parser.add_argument("meta_path", type=str)
parser.add_argument("--save_dir", type=str)
parser.add_argument("--min_seconds", type=float, default=None,
help='if not None, clip shorter than min_seconds is ignored')
parser.add_argument("--max_seconds", type=float, default=None,
help='if not None, clip longer than max_seconds is truncated')
parser.add_argument("--target_fps", type=int, default=30, help='target fps of clips')
parser.add_argument("--shorter_size", type=int, default=720, help='resize the shorter size by keeping ratio')
args = parser.parse_args()
return args
@ -131,25 +143,24 @@ def parse_args():
def main():
args = parse_args()
meta_path = args.meta_path
out_dir = args.out_dir
assert os.path.basename(os.path.dirname(out_dir)) == "data"
os.makedirs(out_dir, exist_ok=True)
meta = pd.read_csv(meta_path)
save_dir = args.save_dir
os.makedirs(save_dir, exist_ok=True)
# create logger
log_dir = os.path.dirname(out_dir)
log_name = os.path.basename(out_dir)
log_dir = os.path.dirname(save_dir)
log_name = os.path.basename(save_dir)
timestamp = time.strftime("%Y%m%d-%H%M%S", time.localtime(time.time()))
log_path = os.path.join(log_dir, f"{log_name}_{timestamp}.log")
logger = MMLogger.get_instance(log_name, log_file=log_path)
# logger = None
# initialize pandarallel
pandarallel.initialize(progress_bar=True)
process_single_row_partial = partial(process_single_row, save_dir=out_dir, log_name=log_name)
process_single_row_partial = partial(process_single_row, args=args, log_name=log_name)
# process
meta = pd.read_csv(args.meta_path)
meta.parallel_apply(process_single_row_partial, axis=1)

View file

@ -1,162 +0,0 @@
import argparse
import os
import subprocess
from concurrent.futures import ThreadPoolExecutor, as_completed
import pandas as pd
from imageio_ffmpeg import get_ffmpeg_exe
from mmengine.logging import print_log
from scenedetect import FrameTimecode
from tqdm import tqdm
def single_process(row, save_dir, logger=None):
# video_id = row['videoID']
# video_path = os.path.join(root_src, f'{video_id}.mp4')
video_path = row["path"]
# check mp4 integrity
# if not is_intact_video(video_path, logger=logger):
# return False
timestamp = row["timestamp"]
if not (timestamp.startswith("[") and timestamp.endswith("]")):
return False
scene_list = eval(timestamp)
scene_list = [(FrameTimecode(s, fps=1), FrameTimecode(t, fps=1)) for s, t in scene_list]
split_video(video_path, scene_list, save_dir=save_dir, logger=logger)
return True
def split_video(
video_path,
scene_list,
save_dir,
min_seconds=None,
max_seconds=None,
target_fps=30,
shorter_size=512,
verbose=False,
logger=None,
):
"""
scenes shorter than min_seconds will be ignored;
scenes longer than max_seconds will be cut to save the beginning max_seconds.
Currently, the saved file name pattern is f'{fname}_scene-{idx}'.mp4
Args:
scene_list (List[Tuple[FrameTimecode, FrameTimecode]]): each element is (s, t): start and end of a scene.
min_seconds (float | None)
max_seconds (float | None)
target_fps (int | None)
shorter_size (int | None)
"""
FFMPEG_PATH = get_ffmpeg_exe()
save_path_list = []
for idx, scene in enumerate(scene_list):
s, t = scene # FrameTimecode
if min_seconds is not None:
if (t - s).get_seconds() < min_seconds:
continue
duration = t - s
if max_seconds is not None:
fps = s.framerate
max_duration = FrameTimecode(timecode="00:00:00", fps=fps)
max_duration.frame_num = round(fps * max_seconds)
duration = min(max_duration, duration)
# save path
fname = os.path.basename(video_path)
fname_wo_ext = os.path.splitext(fname)[0]
# TODO: fname pattern
save_path = os.path.join(save_dir, f"{fname_wo_ext}_scene-{idx}.mp4")
# ffmpeg cmd
cmd = [FFMPEG_PATH]
# Only show ffmpeg output for the first call, which will display any
# errors if it fails, and then break the loop. We only show error messages
# for the remaining calls.
# cmd += ['-v', 'error']
# input path
# cmd += ["-i", video_path]
# clip to cut
cmd += ["-nostdin", "-y", "-ss", str(s.get_seconds()), "-i", video_path, "-t", str(duration.get_seconds())]
# cmd += ["-nostdin", "-y", "-ss", str(s.get_seconds()), "-t", str(duration.get_seconds())]
# target fps
# cmd += ['-vf', 'select=mod(n\,2)']
if target_fps is not None:
cmd += ["-r", f"{target_fps}"]
# aspect ratio
if shorter_size is not None:
cmd += ["-vf", f"scale='if(gt(iw,ih),-2,{shorter_size})':'if(gt(iw,ih),{shorter_size},-2)'"]
# cmd += ['-vf', f"scale='if(gt(iw,ih),{shorter_size},trunc(ow/a/2)*2)':-2"]
cmd += ["-map", "0", save_path]
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout, stderr = proc.communicate()
if verbose:
stdout = stdout.decode("utf-8")
print_log(stdout, logger=logger)
save_path_list.append(video_path)
print_log(f"Video clip saved to '{save_path}'", logger=logger)
return save_path_list
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--root", default="F:/Panda-70M/")
parser.add_argument("--split", default="test")
parser.add_argument("--num_workers", default=5, type=int)
args = parser.parse_args()
return args
def main():
# args = parse_args()
# root = args.root
# split = args.split
root = "F:/Panda-70M/"
root, split = "F:/pexels_new/", "popular_2"
meta_path = os.path.join(root, f"raw/meta/{split}_format_timestamp.csv")
root_dst = os.path.join(root, f"scene_cut/data/{split}")
folder_dst = root_dst
# folder_src = os.path.join(root_src, f'data/{split}')
# folder_dst = os.path.join(root_dst, os.path.relpath(folder_src, root_src))
os.makedirs(folder_dst, exist_ok=True)
meta = pd.read_csv(meta_path)
# create logger
# folder_path_log = os.path.dirname(root_dst)
# log_name = os.path.basename(root_dst)
# timestamp = time.strftime("%Y%m%d-%H%M%S", time.localtime(time.time()))
# log_path = os.path.join(folder_path_log, f"{log_name}_{timestamp}.log")
# logger = MMLogger.get_instance(log_name, log_file=log_path)
logger = None
tasks = []
pool = ThreadPoolExecutor(max_workers=1)
for idx, row in meta.iterrows():
task = pool.submit(single_process, row, folder_dst, logger)
tasks.append(task)
for task in tqdm(as_completed(tasks), total=len(meta)):
task.result()
pool.shutdown()
if __name__ == "__main__":
main()

View file

@ -29,53 +29,24 @@ def process_single_row(row):
return False, ""
def main():
meta_path = "F:/pexels_new/raw/meta/popular_1_format.csv"
meta = pd.read_csv(meta_path)
timestamp_list = []
for idx, row in tqdm(meta.iterrows()):
video_path = row["path"]
detector = AdaptiveDetector(
adaptive_threshold=1.5,
luma_only=True,
)
# detector = ContentDetector()
scene_list = detect(video_path, detector, start_in_scene=True)
timestamp = [(s.get_timecode(), t.get_timecode()) for s, t in scene_list]
timestamp_list.append(timestamp)
meta["timestamp"] = timestamp_list
wo_ext, ext = os.path.splitext(meta_path)
out_path = f"{wo_ext}_timestamp{ext}"
meta.to_csv(out_path, index=False)
print(f"New meta with timestamp saved to '{out_path}'.")
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--meta_path", default="F:/pexels_new/raw/meta/popular_1_format.csv")
parser.add_argument("--num_workers", default=5, type=int)
parser.add_argument("meta_path", type=str)
args = parser.parse_args()
return args
def main_pandarallel():
def main():
args = parse_args()
meta_path = args.meta_path
# meta_path = 'F:/pexels_new/raw/meta/popular_1_format.csv'
meta = pd.read_csv(meta_path)
pandarallel.initialize(progress_bar=True)
meta = pd.read_csv(meta_path)
ret = meta.parallel_apply(process_single_row, axis=1)
succ, timestamps = list(zip(*ret))
meta["timestamp"] = timestamps
meta = meta[np.array(succ)]
@ -86,4 +57,4 @@ def main_pandarallel():
if __name__ == "__main__":
main_pandarallel()
main()

View file

@ -1,97 +0,0 @@
import os
import cv2
from mmengine.logging import print_log
from moviepy.editor import VideoFileClip
def iterate_files(folder_path):
for root, dirs, files in os.walk(folder_path):
# root contains the current directory path
# dirs contains the list of subdirectories in the current directory
# files contains the list of files in the current directory
# Process files in the current directory
for file in files:
file_path = os.path.join(root, file)
# print("File:", file_path)
yield file_path
# Process subdirectories and recursively call the function
for subdir in dirs:
subdir_path = os.path.join(root, subdir)
# print("Subdirectory:", subdir_path)
iterate_files(subdir_path)
def iterate_folders(folder_path):
for root, dirs, files in os.walk(folder_path):
for subdir in dirs:
subdir_path = os.path.join(root, subdir)
yield subdir_path
# print("Subdirectory:", subdir_path)
iterate_folders(subdir_path)
def clone_folder_structure(root_src, root_dst, verbose=False):
src_path_list = iterate_folders(root_src)
src_relpath_list = [os.path.relpath(x, root_src) for x in src_path_list]
os.makedirs(root_dst, exist_ok=True)
dst_path_list = [os.path.join(root_dst, x) for x in src_relpath_list]
for folder_path in dst_path_list:
os.makedirs(folder_path, exist_ok=True)
if verbose:
print(f"Create folder: '{folder_path}'")
def is_intact_video(video_path, mode="moviepy", verbose=False, logger=None):
if not os.path.exists(video_path):
if verbose:
print_log(f"Could not find '{video_path}'", logger=logger)
return False
if mode == "moviepy":
try:
VideoFileClip(video_path)
if verbose:
print_log(f"The video file '{video_path}' is intact.", logger=logger)
return True
except Exception as e:
if verbose:
print_log(f"Error: {e}", logger=logger)
print_log(f"The video file '{video_path}' is not intact.", logger=logger)
return False
elif mode == "cv2":
try:
cap = cv2.VideoCapture(video_path)
if cap.isOpened():
if verbose:
print_log(f"The video file '{video_path}' is intact.", logger=logger)
return True
except Exception as e:
if verbose:
print_log(f"Error: {e}", logger=logger)
print_log(f"The video file '{video_path}' is not intact.", logger=logger)
return False
else:
raise ValueError
def count_frames(video_path, logger=None):
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
print_log(f"Error: Could not open video file '{video_path}'", logger=logger)
return
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
print_log(f"Total frames in the video '{video_path}': {total_frames}", logger=logger)
cap.release()
def count_files(root, suffix=".mp4"):
files_list = iterate_files(root)
cnt = len([x for x in files_list if x.endswith(suffix)])
return cnt