Notes for TI2716-C Multimedia Analysis

These are my review notes for Multimedia Analysis. The topics in these notes (roughly) correspond to those covered in lectures, (roughly) in the same order.

The list below links to each “chapter” of the notes, specifies the lectures covered (3.1T = lecture from the Tuesday of academic week 3.1), and briefly describes the contents of the chapter.

  1. Introduction (3.1T, 3.1F, 3.4F): Defines multimedia, multimedia analysis, the semantic gap, and machine learning, and introduces information filtering and retrieval and the various types of tasks in MMA and ML.
  2. Background Knowledge (3.1T, 3.1F): Reviews mean, variance, gaussian functions, Bayes’ rule, image processing concepts such as SIFT, and several similarity and distance measures.
  3. Evaluating Multimedia Systems (3.2F, 3.3F): Covers evaluating supervised MMA tasks using hold out evaluation and cross-validation, evaluating information retrieval systems, and several evaluation metrics such as accuracy, precision and recall.
  4. Recommender Systems (3.1T, 3.1F): Introduces recommender systems, their typical tasks, and their implementation through challenges, the user-item matrix, baseline predictions, and collaborative filtering.
  5. Information Retrieval (IR) (3.3T): Introduces information retrieval.
    1. IR Using the Vector Space Model (3.3T): Discusses the term-document matrix and the vector space model and its optimizations, such as log frequencies and tf-idf weighting.
    2. IR Using the Unigram Language Model (3.3F): Discusses language models and the unigram language model.
    3. Multimedia Information Retrieval (3.3T): Explores multimedia information retrieval, specifically texture recognition and bag of features.
    4. Music Information Retrieval (3.2T - guest lecture): Examines IR in music, through audio music processing (spectrograms and music fundamentals) and some case studies (fingerprinting, cover song retrieval and genre identification).
  6. Classification (3.2T, 3.3F, 3.4F): Introduces classification and examines music genre classification using bag of frames and gaussian mixture models, and a naive baysian classifier for text.
  7. Automatic Speech Recognition (3.4F): Covers the noisy channel model, automatic speech recognition architecture, gaussian classifiers as acoustic model, and the n-gram model.
  8. Harris Corner Detector (3.4T): Covers quadratic approximation, interpreting eigenvalues and properties of the harris corner detector.

Disclaimer

These notes are based on the TU Delft TI2716-C Multimedia Analysis course taught in Q3 of the 2016 - 2017 academic year. They are provided without guarantee of correctness or completeness.