视频亲嘴片段识别,精彩片段识别。 https://git.yoqi.me/lyq/PredictWonderfulTV

Amir Ziai cd87f34290 3d conv seems to be running 5 years ago
dev 37c2e30e73 cleanup 5 years ago
.gitignore aa643772e8 Initial commit 6 years ago
LICENSE aa643772e8 Initial commit 6 years ago
README.md 37c2e30e73 cleanup 5 years ago
conv.py e700e7517d all the params 5 years ago
conv3d.py 205f3615aa pycharm 5 years ago
data.py cd87f34290 3d conv seems to be running 5 years ago
dev4.ipynb b2fcb55f40 added segmentor, qual stuff next 5 years ago
dev5.ipynb 85c7991b66 segmentor works with youtube 5 years ago
dev6.ipynb fe6d68924e saliency maps 5 years ago
experiments.py b2fcb55f40 added segmentor, qual stuff next 5 years ago
kissing_detector.py cd87f34290 3d conv seems to be running 5 years ago
mel_features.py 0059ff6654 vggish and resnet combined, figuring out input 6 years ago
params.py cd87f34290 3d conv seems to be running 5 years ago
pipeline.py fe6d68924e saliency maps 5 years ago
qualitative.py cda3d270f2 added conv3d 5 years ago
requirements.txt 85c7991b66 segmentor works with youtube 5 years ago
segmentor.py 85c7991b66 segmentor works with youtube 5 years ago
spatial_transforms.py c70e64d6cd audio + img iterator 5 years ago
temporal_transforms.py c70e64d6cd audio + img iterator 5 years ago
test_segmentor.py b2fcb55f40 added segmentor, qual stuff next 5 years ago
train.py cd87f34290 3d conv seems to be running 5 years ago
utils.py efcbf4099a add experiment runner 5 years ago
vggish.py ac9c8b8ff7 vggish path moved 5 years ago
vggish_input.py 0059ff6654 vggish and resnet combined, figuring out input 6 years ago
vggish_params.py 0059ff6654 vggish and resnet combined, figuring out input 6 years ago

README.md

Kissing Detector

Detect kissing scenes in a movie using both audio and video features.

Project for Stanford CS231N

Build dataset

from pipeline import BuildDataset

videos_and_labels = [
    # (file name in base_path, label) where label is 1 for kissing and 0 for not kissing
    ('movies_casino_royale_2006_kissing_1.mp4', 1),
    ('movies_casino_royale_2006_kissing_2.mp4', 1),
    ('movies_casino_royale_2006_kissing_3.mp4', 1),
    ('movies_casino_royale_2006_not_1.mp4', 0),
    ('movies_casino_royale_2006_not_2.mp4', 0),
    ('movies_casino_royale_2006_not_3.mp4', 0),
    
    ('movies_goldeneye_1995_kissing_1.mp4', 1),
    ('movies_goldeneye_1995_kissing_2.mp4', 1),
    ('movies_goldeneye_1995_kissing_3.mp4', 1),
    ('movies_goldeneye_1995_not_1.mp4', 0),
    ('movies_goldeneye_1995_not_2.mp4', 0),
    ('movies_goldeneye_1995_not_3.mp4', 0),
]

builder = BuildDataset(base_path='path/to/movies',
                 videos_and_labels=videos_and_labels,
                 output_path='/path/to/output',
                 test_size=1 / 3)  # set aside 1 / 3 of data for validation
builder.build_dataset()

Data loader

Explorations:

  • ConvNet, VGGish, or both
  • ConvNet architectures: ResNet, VGG, AlexNet, SqueezeNet, DenseNet
  • With and without pre-training
  • (3DC)

Diagnostics

  • Saliency maps
  • Class viz
  • Confusion matrices
  • Detected segments
  • Failure examples

TODO

  • Qual
    • Saliency map
    • class viz
    • Error examples
    • Audio?
  • 3DC