视频亲嘴片段识别，精彩片段识别。 https://git.yoqi.me/lyq/PredictWonderfulTV

Amir Ziai efcbf4099a add experiment runner		7 years ago
.gitignore	aa643772e8 Initial commit	7 years ago
LICENSE	aa643772e8 Initial commit	7 years ago
README.md	e354f07496 readme basics	7 years ago
conv.py	e354f07496 readme basics	7 years ago
data.py	e354f07496 readme basics	7 years ago
dev.ipynb	c70e64d6cd audio + img iterator	7 years ago
dev2.ipynb	c70e64d6cd audio + img iterator	7 years ago
dev3.ipynb	e354f07496 readme basics	7 years ago
experiments.py	efcbf4099a add experiment runner	7 years ago
kissing_detector.py	efcbf4099a add experiment runner	7 years ago
mel_features.py	0059ff6654 vggish and resnet combined, figuring out input	7 years ago
params.py	efcbf4099a add experiment runner	7 years ago
pipeline.py	f0c0c2bfa6 runs, added f1	7 years ago
requirements.txt	efcbf4099a add experiment runner	7 years ago
segmentor.py	0059ff6654 vggish and resnet combined, figuring out input	7 years ago
spatial_transforms.py	c70e64d6cd audio + img iterator	7 years ago
temporal_transforms.py	c70e64d6cd audio + img iterator	7 years ago
train.py	efcbf4099a add experiment runner	7 years ago
utils.py	efcbf4099a add experiment runner	7 years ago
vggish.py	c70e64d6cd audio + img iterator	7 years ago
vggish_input.py	0059ff6654 vggish and resnet combined, figuring out input	7 years ago
vggish_params.py	0059ff6654 vggish and resnet combined, figuring out input	7 years ago

Kissing Detector

Detect kissing scenes in a movie using both audio and video features.

Project for Stanford CS231N

Build dataset

from pipeline import BuildDataset

videos_and_labels = [
    # (file name in base_path, label) where label is 1 for kissing and 0 for not kissing
    ('movies_casino_royale_2006_kissing_1.mp4', 1),
    ('movies_casino_royale_2006_kissing_2.mp4', 1),
    ('movies_casino_royale_2006_kissing_3.mp4', 1),
    ('movies_casino_royale_2006_not_1.mp4', 0),
    ('movies_casino_royale_2006_not_2.mp4', 0),
    ('movies_casino_royale_2006_not_3.mp4', 0),
    
    ('movies_goldeneye_1995_kissing_1.mp4', 1),
    ('movies_goldeneye_1995_kissing_2.mp4', 1),
    ('movies_goldeneye_1995_kissing_3.mp4', 1),
    ('movies_goldeneye_1995_not_1.mp4', 0),
    ('movies_goldeneye_1995_not_2.mp4', 0),
    ('movies_goldeneye_1995_not_3.mp4', 0),
]

builder = BuildDataset(base_path='path/to/movies',
                 videos_and_labels=videos_and_labels,
                 output_path='/path/to/output',
                 test_size=1 / 3)  # set aside 1 / 3 of data for validation
builder.build_dataset()

Data loader

Explorations:

ConvNet, VGGish, or both
ConvNet architectures: ResNet, VGG, AlexNet, SqueezeNet, DenseNet
With and without pre-training
(3DC)

Diagnostics

Saliency maps
Class viz
Confusion matrices
Detected segments
Failure examples

TODO

Define experiments
...

README.md

Kissing Detector

Build dataset

Data loader

Explorations:

Diagnostics

TODO