top of page
Writer's pictureHackers Realm

Urban Sound Analysis using Python | Classification | Deep Learning Project Tutorial

Updated: Jun 2, 2023

Dive into the world of urban sound analysis with Python! This tutorial explores classification and deep learning techniques to analyze and classify urban sounds. Learn to build models that can distinguish between various sounds in urban environments, opening doors to applications in noise pollution monitoring, smart cities, and more. Enhance your skills in audio processing, machine learning, and unlock the potential of urban sound analysis. Join this comprehensive project tutorial to unravel the secrets hidden within the sounds of the city. #UrbanSoundAnalysis #Python #Classification #DeepLearning #AudioProcessing #MachineLearning #NoisePollution


Urban Sound Analysis Sound Classification
Urban Sound Analysis

In this project tutorial we are going to analyze and classify various audio files to a corresponding class and visualize the frequency of the sounds through a plot.



You can watch the step by step explanation video tutorial down below


Dataset Information

This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes:

  • air_conditioner

  • car_horn

  • children_playing

  • dog_bark

  • drilling

  • engine_idling

  • gun_shot

  • jackhammer

  • siren

  • street_music

Download the dataset here


Mounting Drive


We are mounting the sound dataset from Google Drive

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive

  • The files must be uploaded to your Google Drive account for this to work.

  • An authorization link is provided you must click the link to access the authorization code and paste it in the code box.

Let us verify with directory we are working in

!pwd

/content


Unzip data


Now we unzip the train dataset from the drive

!unzip 'drive/MyDrive/Colab Notebooks/train.zip'
  • The dataset file is around 3GB

Streaming output truncated to the last 5000 lines.

inflating: Train/1674.wav inflating: Train/1675.wav inflating: Train/1677.wav inflating: Train/1678.wav inflating: Train/1679.wav inflating: Train/168.wav inflating: Train/1680.wav inflating: Train/1681.wav inflating: Train/1686.wav inflating: Train/1687.wav

...

  • For this example we are listing 10 sound samples for a simple view


Import modules

import pandas as pd
import numpy as np
import librosa
import librosa.display
import glob
import IPython.display as ipd
import random
%pylab inline

import warnings
warnings.filterwarnings('ignore')
  • pandas - used to perform data manipulation and analysis

  • numpy - used to perform a wide variety of mathematical operations on arrays

  • librosa - used to analyze music and sound files

  • librosa.display - used to display sound data as images

  • glob - used to find all pathnames matching a specific pattern

  • IPython.display - used to display and hear the audio

  • random - used for randomizing

  • %pylab inline - to enable the inline plotting

  • warnings - to manipulate warnings details


Loading the dataset


Now we load the dataset for processing

df = pd.read_csv('Urban Sound Dataset.csv')
df.head()
Urban Sound Dataset
Urban Sound Dataset
  • ID - Name of the audio file

  • Class - Name of the output class the audio file belongs to


Let us display an audio file

ipd.Audio('Train/1.wav')
  • Sound bar display of the audio file from the data


Exploratory Data Analysis


In this step we will visualize different audio sample of the data through wave plots.


We will load the audio file into an array

data, sampling_rate = librosa.load('Train/1.wav')
  • sampling_rate - number of splits or samples per second


Now we will view the data array

data

array([-0.09602016, -0.14303702, 0.05203498, ..., -0.01646687, -0.00915894, 0.09742922], dtype=float32)

  • Audio files loaded into values

  • Each value is a frequency value of the data


Next we will view the sampling rate

sampling_rate

22050

  • Output value determines the amount of samples per second


Now we plot some graphs of the audio files

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)
Wave Plot
Wave Plot
  • figsize=(12,4) - size of the plot graph

  • librosa.display.waveplot() - display a waveplot of the data and sampling rate



index = random.choice(df.index)

print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)

Class: dog_bark

Wave Plot of Dog Bark
Wave Plot of Dog Bark
  • Randomly picked audio file to train

  • librosa.load('Train/'+str(df['ID'][index]) + '.wav') - creating a whole path for the data file and append the format

  • Graph display waveplot of a dog bark from the data


index = random.choice(df.index)

print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)

Class: gun_shot

Wave Plot of Gun Shot
Wave Plot of Gun Shot
  • Different audio data randomly picked

  • Graph display of a gun shot data sample


index = random.choice(df.index)

print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')

plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)

Class: car_horn

Wave Plot of Car Horn
Wave Plot of Car Horn
  • Graph display of a car horn data sample



Now we will view the different class distribution in the data set

import seaborn as sns
plt.figure(figsize=(12,7))
sns.countplot(df['Class'])
Distribution of Class Labels
Distribution of Class Labels
  • seaborn - built on top of matplotlib with similar functionalities

  • Visualization through a bar graph for the no. of samples for each class.



Input Split


The data currently is in the audio file, we need to extract the audio into an array and convert the data as a sample to directly load the input and output data.

import os

def parser(row):
    # path of the file
    file_name = os.path.join('Train', str(row.ID) + '.wav')
    # load the audio file
    x, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
    # extract features from the data
    mfccs = np.mean(librosa.feature.mfcc(y=x, sr=sample_rate, n_mfcc=40).T, axis=0)
    
    feature = mfccs
    label = row.Class
    
    return [feature, label]
  • import os - used to obtain and concatenate path directories

  • res_type='kaiser_fast' - used to extract the features very fast

  • librosa.feature.mfcc() - Mel-frequency cepstral coefficients technique to extract audio file features

  • feature - array of the features extracted form the data

  • label - name of the class of the extracted data


Now we will load the data for training

data = df.apply(parser, axis=1)
data.columns = ['feature','label']
  • Assigning the columns to display the features and the corresponding label of the data


data[0]

[array([-82.12358939, 139.50591598, -42.43086489, 24.82786139, -11.62076447, 23.49708426, -12.19458986, 25.89713885, -9.40527728, 21.21042898, -7.36882138, 14.25433903, -8.67870015, 7.75023765, -10.1241154 , 3.2581183 , -11.35261914, 2.80096779, -7.04601346, 3.91331351, -2.3349743 , 2.01242254, -2.79394367, 4.12927394, -1.62076864, 4.32620082, -1.03440959, -1.23297714, -3.11085341, 0.32044827, -1.787786 , 0.44295495, -1.79164752, -0.76361758, -1.24246428, -0.27664012, 0.65718559, -0.50237115, -2.60428533, -1.05346291]), 'siren']

  • List of data in a single array in the first index

  • Second index indicates the class


Now we split the data for better processing

# input split
X = np.array(list(zip(*data))[0])
y = np.array(list(zip(*data))[1])


Label encoder


We will transform the 10 class labels from text attributes to numerical attributes

from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils

le = LabelEncoder()
y = np_utils.to_categorical(le.fit_transform(y))
  • Each class converted into integer values in different categorical columns


y.shape

(5435, 10)

  • Shape of the data set for training, indicating 5435 samples of training data with 10 classes


y[0]

array([0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], dtype=float32)

  • Single sample of the data in numerical columns of the classes

  • If the output class is present in the sample, it will change the corresponding numerical column to 1 and the rest to 0


Model Training


Let us create the model for training

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten

num_classes = 10

# model creation
model = Sequential()

model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.3))

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.3))

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.3))

model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', metrics='accuracy', optimizer='adam')
  • Dense - single dimension linear layer

  • Dropout - used to add regularization to the data, avoiding over fitting & dropping out a fraction of the data

  • Activation - layer for the use of certain threshold

  • Flatten - convert a 2D array into a 1D array

  • Loss=’sparse_categorical_crossentropy’ - basic structure for the threshold to adjust the gradient descent

  • Optimizer=’adam’ - automatically adjust the learning rate for the model over the number of epochs


Now we will train the data

# train the model
model.fit(X, y, batch_size=32, epochs=100, validation_split=0.25)

Epoch 1/30

1149/1149 [==============================] - 10s 3ms/step - loss: 0.4816 - accuracy: 0.8475 - val_loss: 0.1202 - val_accuracy: 0.9637

Epoch 2/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.1336 - accuracy: 0.9605 - val_loss: 0.0848 - val_accuracy: 0.9743

Epoch 3/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0863 - accuracy: 0.9732 - val_loss: 0.0807 - val_accuracy: 0.9742

Epoch 4/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0685 - accuracy: 0.9783 - val_loss: 0.0734 - val_accuracy: 0.9788

Epoch 5/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0543 - accuracy: 0.9825 - val_loss: 0.0690 - val_accuracy: 0.9809

Epoch 6/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0461 - accuracy: 0.9844 - val_loss: 0.0684 - val_accuracy: 0.9808

Epoch 7/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0360 - accuracy: 0.9873 - val_loss: 0.0743 - val_accuracy: 0.9798

Epoch 8/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0318 - accuracy: 0.9884 - val_loss: 0.0733 - val_accuracy: 0.9811

Epoch 9/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0319 - accuracy: 0.9891 - val_loss: 0.0658 - val_accuracy: 0.9838

Epoch 10/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0242 - accuracy: 0.9919 - val_loss: 0.0728 - val_accuracy: 0.9827



Epoch 11/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0218 - accuracy: 0.9926 - val_loss: 0.0815 - val_accuracy: 0.9818

Epoch 12/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0286 - accuracy: 0.9895 - val_loss: 0.0766 - val_accuracy: 0.9829

Epoch 13/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0199 - accuracy: 0.9928 - val_loss: 0.0762 - val_accuracy: 0.9820

Epoch 14/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0239 - accuracy: 0.9918 - val_loss: 0.0754 - val_accuracy: 0.9836

Epoch 15/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0160 - accuracy: 0.9938 - val_loss: 0.0865 - val_accuracy: 0.9820

Epoch 16/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0196 - accuracy: 0.9935 - val_loss: 0.0842 - val_accuracy: 0.9822

Epoch 17/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0152 - accuracy: 0.9951 - val_loss: 0.0825 - val_accuracy: 0.9828

Epoch 18/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0155 - accuracy: 0.9943 - val_loss: 0.0889 - val_accuracy: 0.9817

Epoch 19/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0207 - accuracy: 0.9930 - val_loss: 0.0886 - val_accuracy: 0.9822

Epoch 20/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0122 - accuracy: 0.9955 - val_loss: 0.0958 - val_accuracy: 0.9822



Epoch 21/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0986 - val_accuracy: 0.9824

Epoch 22/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0166 - accuracy: 0.9949 - val_loss: 0.0987 - val_accuracy: 0.9824

Epoch 23/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0153 - accuracy: 0.9949 - val_loss: 0.0917 - val_accuracy: 0.9832

Epoch 24/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0147 - accuracy: 0.9950 - val_loss: 0.0967 - val_accuracy: 0.9838

Epoch 25/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0112 - accuracy: 0.9957 - val_loss: 0.1057 - val_accuracy: 0.9816

Epoch 26/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0134 - accuracy: 0.9959 - val_loss: 0.1024 - val_accuracy: 0.9830

Epoch 27/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0085 - accuracy: 0.9968 - val_loss: 0.1256 - val_accuracy: 0.9795

Epoch 28/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0127 - accuracy: 0.9958 - val_loss: 0.1099 - val_accuracy: 0.9832

Epoch 29/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0136 - accuracy: 0.9952 - val_loss: 0.1043 - val_accuracy: 0.9824

Epoch 30/30

1149/1149 [==============================] - 4s 3ms/step - loss: 0.0132 - accuracy: 0.9959 - val_loss: 0.1162 - val_accuracy: 0.9827

  • Display of the results after training the data

  • batch_size=32 - amount of data to process per iteration

  • epochs=30 - no. of iterations for training

  • validation_split=0.25 - train and split validation percentage

  • The training accuracy and validation accuracy increases per iteration

  • Both training and validation accuracy reached more than 90 percent


Final Thoughts

  • Deep learning models give more accuracy results compared to machine learning algorithms

  • Sound features are extracted and used for training

  • More training the data will get you better accuracy

  • This model can be reused differently depending on the data set and parameters, including speech recognition or other sound related tracks


In this project tutorial, we have explored the Urban Sound Analysis dataset as a classification project under deep learning. Different urban sounds were identified and classified with explanatory data analysis


Get the project notebook from here


Thanks for reading the article!!!


Check out more project videos from the YouTube channel Hackers Realm

1,101 views

Comments


bottom of page