视频亲嘴片段识别,精彩片段识别。 https://git.yoqi.me/lyq/PredictWonderfulTV

Amir Ziai b2fcb55f40 added segmentor, qual stuff next 5 years ago
.gitignore aa643772e8 Initial commit 5 years ago
LICENSE aa643772e8 Initial commit 5 years ago
README.md b2fcb55f40 added segmentor, qual stuff next 5 years ago
conv.py e700e7517d all the params 5 years ago
data.py e354f07496 readme basics 5 years ago
dev.ipynb c70e64d6cd audio + img iterator 5 years ago
dev2.ipynb c70e64d6cd audio + img iterator 5 years ago
dev3.ipynb e354f07496 readme basics 5 years ago
dev4.ipynb b2fcb55f40 added segmentor, qual stuff next 5 years ago
experiments.py b2fcb55f40 added segmentor, qual stuff next 5 years ago
kissing_detector.py efcbf4099a add experiment runner 5 years ago
mel_features.py 0059ff6654 vggish and resnet combined, figuring out input 5 years ago
params.py b2fcb55f40 added segmentor, qual stuff next 5 years ago
pipeline.py b2fcb55f40 added segmentor, qual stuff next 5 years ago
requirements.txt b2fcb55f40 added segmentor, qual stuff next 5 years ago
segmentor.py b2fcb55f40 added segmentor, qual stuff next 5 years ago
spatial_transforms.py c70e64d6cd audio + img iterator 5 years ago
temporal_transforms.py c70e64d6cd audio + img iterator 5 years ago
test_segmentor.py b2fcb55f40 added segmentor, qual stuff next 5 years ago
train.py ec1ba63cbc latest params 5 years ago
utils.py efcbf4099a add experiment runner 5 years ago
vggish.py ac9c8b8ff7 vggish path moved 5 years ago
vggish_input.py 0059ff6654 vggish and resnet combined, figuring out input 5 years ago
vggish_params.py 0059ff6654 vggish and resnet combined, figuring out input 5 years ago

README.md

Kissing Detector

Detect kissing scenes in a movie using both audio and video features.

Project for Stanford CS231N

Build dataset

from pipeline import BuildDataset

videos_and_labels = [
    # (file name in base_path, label) where label is 1 for kissing and 0 for not kissing
    ('movies_casino_royale_2006_kissing_1.mp4', 1),
    ('movies_casino_royale_2006_kissing_2.mp4', 1),
    ('movies_casino_royale_2006_kissing_3.mp4', 1),
    ('movies_casino_royale_2006_not_1.mp4', 0),
    ('movies_casino_royale_2006_not_2.mp4', 0),
    ('movies_casino_royale_2006_not_3.mp4', 0),
    
    ('movies_goldeneye_1995_kissing_1.mp4', 1),
    ('movies_goldeneye_1995_kissing_2.mp4', 1),
    ('movies_goldeneye_1995_kissing_3.mp4', 1),
    ('movies_goldeneye_1995_not_1.mp4', 0),
    ('movies_goldeneye_1995_not_2.mp4', 0),
    ('movies_goldeneye_1995_not_3.mp4', 0),
]

builder = BuildDataset(base_path='path/to/movies',
                 videos_and_labels=videos_and_labels,
                 output_path='/path/to/output',
                 test_size=1 / 3)  # set aside 1 / 3 of data for validation
builder.build_dataset()

Data loader

Explorations:

  • ConvNet, VGGish, or both
  • ConvNet architectures: ResNet, VGG, AlexNet, SqueezeNet, DenseNet
  • With and without pre-training
  • (3DC)

Diagnostics

  • Saliency maps
  • Class viz
  • Confusion matrices
  • Detected segments
  • Failure examples

TODO

  • Segmentor
  • Qual
    • Saliency map
    • class viz
    • Error examples
    • Audio?
  • 3DC