Independent Component Analysis

A Code-Centric Introduction to Independent Component Analysis (ICA)

This is the first in what I'm hoping to make a series of posts on representation learning and unsupervised methods in general. I've noticed that there are far fewer resources out there detailing these topics than there are for common supervised learning topics, and next-to-none that show them off in practice (i.e. with code) along with the underlying math. I'd like these posts to be accessible to a wider audience while still providing mathematical intuition.

Part 1: Motivation and Introduction

What is ICA and why would we want to do it?

Suppose you are at a banquet with $n$ total attendees, all simultaneously engaged in conversation. Should you stand in the middle of this crowd, you will be able to pick out individual voices to tune in and out of at will; however, any microphone positioned in the banquet hall will record an incomprehensible cacaphony, all $n$ voices jumbled together based on their distance from the device. Say you would like to be able to listen to the crowd of voices on a per-speaker basis. With only the one recording, you might1 be out of luck. If you have recordings from $n$ microphones each placed at different positions rather than one, how can we recover the individual voice signals from every attendee?

What better way to illustrate this than to listen to some recordings! (The individual source voice signals were created by Google Translate Text-to-Speech and then mixed by me.)

In [1]:
import numpy as np
from numpy import linalg
import matplotlib.pyplot as plt
from scipy import signal
from scipy.io import wavfile
from typing import Tuple
import os
import glob
from IPython.display import Audio, display
In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
In [3]:
%matplotlib inline
In [4]:
# create convenience function for plotting and playing audio
def show_audio(a: Tuple[int, np.ndarray])->None: # a: (sample_rate, audio_array)
    fig, ax = plt.subplots()
    time_axis = np.linspace(start=0, stop=(len(a[1])/a[0]),num=np.round(len(a[1])))
    ax.plot(time_axis, a[1])
    ax.set_xlabel('Time (seconds)')
    ax.set_ylabel('Amplitude')
    display(Audio(a[1], rate=a[0]))
In [5]:
# collect all the wav files
files = glob.glob('./data/mixed_data/*.wav')
In [6]:
samp_rates = []
sound_list = []
In [7]:
# collect sampling frequencies and audio signals
for f in files:
    samp_rate, sound = wavfile.read(f)
    samp_rates.append(samp_rate)
    sound_list.append(sound)
In [8]:
# store as numpy array
audio_array = np.array(sound_list)
In [9]:
# listen and visualize sound waves as sanity check
for a in zip(samp_rates, sound_list):
    show_audio(a)