I have recently started learning about analyzing and modeling audio signals using Python.

Google search lead me to LibROSA - a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems. I am quoting from here.

A fundamental and popular way to visualize acoustic signals is by plotting the amplitude or the frequency of the signal over time. The frequency-time plot has a special name: spectrogram.

A spectrogram is a visual representation of the spectrum of frequencies of sound (or other signals) as they vary with time. Spectrograms can be used to identify spoken words phonetically, analyze calls of animals. music, sonar, radar, and speech, seismology, etc.

I decided to analyze calls of animals - meows and woofs. Kaggle offers a dataset of short-duration audio clips of cats and dogs for classification. It serves two purposes - I get to play with audio signals and pursue my interest in machine learning and predictive modeling.

Below are visualizations of amplitude vs. time and spectrogram (frequency vs. time) of a pretty loud cat (clip no. 78 in the training set).



This cat – let’s call it Cat78 makes seven distinct calls. Cat78 sounds stern, but makes a better spectrogram than cats and dogs with more melodic calls.

I haven’t included the audio clip of Cat78. If you are interested in finding out how it sounds check out my Github repo AudioClassification_CatDog I have uploaded codes for playing audio clips and audio signal visualization.

Please note this is very much a work in progress. I would be adding more as I continue to learn more about audio signal analysis and modeling.

Written on November 21, 2017