Lab Exercise: Fourier Transform in Shazam's Fingerprinting

Lab Overview

Learning Objectives

After completing this lab, students will be able to:

Apply Fourier Transform to real audio signals for frequency analysis
Implement a simplified version of Shazam's audio fingerprinting algorithm
Understand how spectrograms are used in audio recognition systems
Generate and compare audio fingerprints using peak extraction
Evaluate the robustness of fingerprinting techniques to noise

Background

Shazam's audio recognition technology relies on the Fast Fourier Transform (FFT) to convert time-domain audio signals into frequency-domain representations. By identifying unique patterns in the frequency spectrum (audio fingerprints), Shazam can match short audio samples against a database of millions of songs.

In this lab, you will implement the core components of this system, focusing on the signal processing aspects relevant to electrical engineering.

Pre-Lab Preparation

Complete these tasks before the lab session:

Review Fourier Transform theory and properties
Understand the difference between DFT and FFT algorithms
Install Python with NumPy, SciPy, and Matplotlib libraries
Download the sample audio files provided for the lab

Pre-Lab Questions

1. Explain why frequency-domain analysis (using FFT) is more effective than time-domain analysis for audio fingerprinting.

2. What is the purpose of applying a window function (like Hann or Hamming) before performing FFT on audio signals?

3. Calculate the frequency resolution of an FFT with N=4096 points for audio sampled at 44.1 kHz.

Frequency Resolution Δf = Fs / N

Where Fs = Sampling Frequency, N = FFT size

Lab Procedure

Part 1: Audio Signal Generation

First, we'll generate synthetic audio signals to understand the FFT process. Create a Python script with the following functions:

                                import numpy as np

                                import matplotlib.pyplot as plt

                                from scipy.io import wavfile

                                # Generate a test audio signal with multiple frequencies

                                def generate_test_signal(duration=2, fs=44100):

                                    t = np.linspace(0, duration, int(fs * duration), endpoint=False)

                                    # Create signal with three frequency components

                                    freqs = [440, 880, 1320]  # A4, A5, E6

                                    signal = np.zeros_like(t)

                                    for f in freqs:

                                        signal += 0.5 * np.sin(2 * np.pi * f * t)

                                    return t, signal, fs

                                # Add white noise to simulate real recording conditions

                                def add_noise(signal, snr_db=20):

                                    signal_power = np.mean(signal**2)

                                    noise_power = signal_power / (10**(snr_db/10))

                                    noise = np.random.normal(0, np.sqrt(noise_power), len(signal))

                                    return signal + noise

Part 2: FFT Implementation & Analysis

Implement FFT calculation and analyze the frequency components of the audio signal.

                                # Compute FFT and generate frequency axis

                                def compute_fft(signal, fs, apply_window=True):

                                    n = len(signal)

                                    # Apply Hann window to reduce spectral leakage

                                    if apply_window:

                                        window = np.hanning(n)

                                        signal = signal * window

                                    # Compute FFT

                                    fft_result = np.fft.fft(signal)

                                    fft_magnitude = np.abs(fft_result[:n//2])

                                    fft_freq = np.fft.fftfreq(n, 1/fs)[:n//2]

                                    return fft_freq, fft_magnitude

                                # Identify frequency peaks (simplified Shazam approach)

                                def find_peaks(frequencies, magnitude, threshold=0.1, min_distance=5):

                                    peaks = []

                                    max_mag = np.max(magnitude)

                                    for i in range(1, len(magnitude)-1):

                                        if (magnitude[i] > magnitude[i-1] and

                                            magnitude[i] > magnitude[i+1] and

                                            magnitude[i] > threshold * max_mag):

                                            peaks.append((frequencies[i], magnitude[i]))

                                    return peaks

Part 3: Spectrogram Generation

Create a spectrogram - a time-frequency representation essential for audio fingerprinting.

                                # Generate spectrogram using Short-Time Fourier Transform (STFT)

                                def generate_spectrogram(signal, fs, window_size=1024, hop_size=512):

                                    n_windows = (len(signal) - window_size) // hop_size + 1

                                    spectrogram = np.zeros((window_size//2, n_windows))

                                    for i in range(n_windows):

                                        start = i * hop_size

                                        end = start + window_size

                                        segment = signal[start:end]

                                        window = np.hanning(window_size)

                                        segment = segment * window

                                        # Compute FFT for this segment

                                        fft_result = np.fft.fft(segment)[:window_size//2]

                                        magnitude = np.abs(fft_result)

                                        spectrogram[:, i] = magnitude

                                    time_axis = np.arange(n_windows) * hop_size / fs

                                    freq_axis = np.fft.fftfreq(window_size, 1/fs)[:window_size//2]

                                    return time_axis, freq_axis, spectrogram

Note: The spectrogram is a 2D representation with time on the x-axis and frequency on the y-axis. Color intensity represents magnitude at each time-frequency point.

Window Size: 1024

Part 4: Audio Fingerprint Generation

Implement the core Shazam fingerprinting algorithm by identifying peak constellations in the spectrogram.

                                # Find peaks in spectrogram (Shazam's approach)

                                def find_spectrogram_peaks(spectrogram, time_axis, freq_axis, threshold=0.3):

                                    peaks = []

                                    max_val = np.max(spectrogram)

                                    rows, cols = spectrogram.shape

                                    for t in range(1, cols-1):

                                        for f in range(1, rows-1):

                                            val = spectrogram[f, t]

                                            # Check if it's a local maximum

                                            if (val > threshold * max_val and

                                                val > spectrogram[f-1, t] and

                                                val > spectrogram[f+1, t] and

                                                val > spectrogram[f, t-1] and

                                                val > spectrogram[f, t+1]):

                                                peaks.append((time_axis[t], freq_axis[f], val))

                                    return peaks

                                # Create fingerprint hashes from peak pairs (simplified)

                                def create_fingerprint_hashes(peaks, max_time_diff=1.0, max_freq_diff=500):

                                    hashes = []

                                    n = len(peaks)

                                    for i in range(n):

                                        t1, f1, m1 = peaks[i]

                                        for j in range(i+1, min(i+5, n)):  # Limit pairs for efficiency

                                            t2, f2, m2 = peaks[j]

                                            time_diff = t2 - t1

                                            freq_diff = f2 - f1

                                            # Create hash from the pair

                                            hash_val = hash((int(f1), int(f2), int(time_diff*1000)))

                                            hashes.append(hash_val)

                                    return hashes

Generated Fingerprint Hashes

These hash values represent unique features of the audio signal:

Key Concept: Shazam stores these hashes in a database. When you record audio, it generates similar hashes and looks for matches in the database. The matching process is efficient because it compares hashes rather than the full audio signal.

Data Analysis & Results

Analysis Questions

1. How does the window size affect the spectrogram? Compare time resolution vs frequency resolution.

2. What happens to the fingerprint when you add noise to the signal? Test with different SNR values.

3. How many unique fingerprint hashes were generated from your test signal? How might this scale for a full song?

Experimental Results

Test Condition	Peaks Found	Fingerprint Hashes	Computation Time (ms)
Clean Signal	-	-	-
With Noise (SNR=20dB)	-	-	-
Different Window Size	-	-	-

Lab Materials

Required Software

Python 3.8+
NumPy & SciPy
Matplotlib
Jupyter Notebook (optional)

Sample Audio Files

Test Signal

440Hz + 880Hz + 1320Hz

Upload Your Audio File

(WAV format, max 10 seconds)

Safety & Guidelines

Save your work frequently
Use headphones for audio playback
Keep volume at reasonable levels
Document all parameters and results

Lab Checklist

Pre-lab questions completed

Test signal generated and analyzed

FFT computed with/without window

Spectrogram generated

Peaks identified in spectrogram

Fingerprint hashes generated

Analysis questions answered

EE Concepts Applied

Sampling Theorem

Fs ≥ 2Fmax to avoid aliasing. Shazam uses 44.1 kHz sampling to capture frequencies up to 22.05 kHz.

Fast Fourier Transform

FFT reduces DFT complexity from O(N²) to O(N log N), enabling real-time audio processing.

Windowing

Reduces spectral leakage by tapering signal edges before FFT computation.

Time-Frequency Trade-off

Heisenberg uncertainty principle: better time resolution means worse frequency resolution, and vice versa.

Laboratory Exercise: Fourier Transform in Audio Fingerprinting