Music Information Retrieval

Relevant Lectures: 3.2T (guest lecture by Cynthia Liem)

This chapter introduces information retrieval through the lens of its application to music. We first look at audio music processing, then at some case studies, and finally note that music can also be a multimedia experience.

Audio Music Processing

This chapter focuses on content-based music processing based on audio signals, and does not consider advanced music theory or symbolic representations of music.

Spectrograms

“A spectrogram is a visual representation of the spectrum of frequencies in a sound or other signal as they vary with time” (Wikipedia). It looks like this:

Spectrogram

(Image source: WikiMedia Commons)

Music Fundamentals

Acoustic Concepts

Temporal Concepts

Higher-Level Concepts

Mid-Level Feature Representations

Structure Analysis

An onset refers to the “attack” of a sound. Tracking these over time produces inter-onset intervals (IOIs), which can be used to perform autocorrelation on a piece of audio to measure its self-similarity.

Case Studies

Many tasks in music information retrieval involve finding music that is, to some degree, similar to the input piece. Note that all of the following are clearly information retrieval tasks, since they take a query as input against the (relatively) static collection of all music in the world, and produce a (ranked) list of matching pieces of music.

Retrieval Target Shared Aspect Matching Specificity Case Study
(Compressed/ enhanced) digital copies Performance instance Exact Fingerprinting
Semantic gap Semantic gap Semantic gap Semantic gap
Cover songs Underlying musical work Approximate Cover Song Retrieval
Similar songs Performer/ composer Global characteristics  
Similar songs Genre Global characteristics Genre Classification

The semantic gap indicates that, at this point, the retrieval task becomes more subjective, and therefore also more difficult.

Fingerprinting

Audio fingerprinting involves finding exact matches of songs based on short samples (commercially: think Shazam and SoundHound) in a way that is robust to noise and fast.

The basic concept of fingerprinting is to create a database of small, easily searchable representations of data points (in this application, songs). When a new song fragment comes in, the system takes its fingerprint and compares it to the rest of the database to see if there’s a match.

At the core of this is finding a robust fingerprinting technique. One such technique is the Philips fingerprinting algorithm (.pdf).

Cover Song Retrieval

Cover songs may differ from their originals in tempo and tembre. There are several techniques to counter this, such as looking at minimum edit distances using dynamic programming.

Genre Classification

Genre classification uses a bag-of-frames approach similar to the bag-of-words approach in image processing. See classification.

Music & Multimedia

Music can also be a multimedia experience. Sadly, the (very cool) examples shown in the lecture do not translate well to a markdown file.


[ Home ]