视频亲嘴片段识别,精彩片段识别。 https://git.yoqi.me/lyq/PredictWonderfulTV

Amir Ziai efcbf4099a add experiment runner 5 years ago
.gitignore aa643772e8 Initial commit 5 years ago
LICENSE aa643772e8 Initial commit 5 years ago
README.md e354f07496 readme basics 5 years ago
conv.py e354f07496 readme basics 5 years ago
data.py e354f07496 readme basics 5 years ago
dev.ipynb c70e64d6cd audio + img iterator 5 years ago
dev2.ipynb c70e64d6cd audio + img iterator 5 years ago
dev3.ipynb e354f07496 readme basics 5 years ago
experiments.py efcbf4099a add experiment runner 5 years ago
kissing_detector.py efcbf4099a add experiment runner 5 years ago
mel_features.py 0059ff6654 vggish and resnet combined, figuring out input 5 years ago
params.py efcbf4099a add experiment runner 5 years ago
pipeline.py f0c0c2bfa6 runs, added f1 5 years ago
requirements.txt efcbf4099a add experiment runner 5 years ago
segmentor.py 0059ff6654 vggish and resnet combined, figuring out input 5 years ago
spatial_transforms.py c70e64d6cd audio + img iterator 5 years ago
temporal_transforms.py c70e64d6cd audio + img iterator 5 years ago
train.py efcbf4099a add experiment runner 5 years ago
utils.py efcbf4099a add experiment runner 5 years ago
vggish.py c70e64d6cd audio + img iterator 5 years ago
vggish_input.py 0059ff6654 vggish and resnet combined, figuring out input 5 years ago
vggish_params.py 0059ff6654 vggish and resnet combined, figuring out input 5 years ago

README.md

Kissing Detector

Detect kissing scenes in a movie using both audio and video features.

Project for Stanford CS231N

Build dataset

from pipeline import BuildDataset

videos_and_labels = [
    # (file name in base_path, label) where label is 1 for kissing and 0 for not kissing
    ('movies_casino_royale_2006_kissing_1.mp4', 1),
    ('movies_casino_royale_2006_kissing_2.mp4', 1),
    ('movies_casino_royale_2006_kissing_3.mp4', 1),
    ('movies_casino_royale_2006_not_1.mp4', 0),
    ('movies_casino_royale_2006_not_2.mp4', 0),
    ('movies_casino_royale_2006_not_3.mp4', 0),
    
    ('movies_goldeneye_1995_kissing_1.mp4', 1),
    ('movies_goldeneye_1995_kissing_2.mp4', 1),
    ('movies_goldeneye_1995_kissing_3.mp4', 1),
    ('movies_goldeneye_1995_not_1.mp4', 0),
    ('movies_goldeneye_1995_not_2.mp4', 0),
    ('movies_goldeneye_1995_not_3.mp4', 0),
]

builder = BuildDataset(base_path='path/to/movies',
                 videos_and_labels=videos_and_labels,
                 output_path='/path/to/output',
                 test_size=1 / 3)  # set aside 1 / 3 of data for validation
builder.build_dataset()

Data loader

Explorations:

  • ConvNet, VGGish, or both
  • ConvNet architectures: ResNet, VGG, AlexNet, SqueezeNet, DenseNet
  • With and without pre-training
  • (3DC)

Diagnostics

  • Saliency maps
  • Class viz
  • Confusion matrices
  • Detected segments
  • Failure examples

TODO

  • Define experiments
  • ...