Dive into the world of urban sound analysis with Python! This tutorial explores classification and deep learning techniques to analyze and classify urban sounds. Learn to build models that can distinguish between various sounds in urban environments, opening doors to applications in noise pollution monitoring, smart cities, and more. Enhance your skills in audio processing, machine learning, and unlock the potential of urban sound analysis. Join this comprehensive project tutorial to unravel the secrets hidden within the sounds of the city. #UrbanSoundAnalysis #Python #Classification #DeepLearning #AudioProcessing #MachineLearning #NoisePollution
In this project tutorial we are going to analyze and classify various audio files to a corresponding class and visualize the frequency of the sounds through a plot.
You can watch the step by step explanation video tutorial down below
Dataset Information
This dataset contains 8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes:
air_conditioner
car_horn
children_playing
dog_bark
drilling
engine_idling
gun_shot
jackhammer
siren
street_music
Download the dataset here
Mounting Drive
We are mounting the sound dataset from Google Drive
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
The files must be uploaded to your Google Drive account for this to work.
An authorization link is provided you must click the link to access the authorization code and paste it in the code box.
Let us verify with directory we are working in
!pwd
/content
Unzip data
Now we unzip the train dataset from the drive
!unzip 'drive/MyDrive/Colab Notebooks/train.zip'
The dataset file is around 3GB
Streaming output truncated to the last 5000 lines.
inflating: Train/1674.wav inflating: Train/1675.wav inflating: Train/1677.wav inflating: Train/1678.wav inflating: Train/1679.wav inflating: Train/168.wav inflating: Train/1680.wav inflating: Train/1681.wav inflating: Train/1686.wav inflating: Train/1687.wav
...
For this example we are listing 10 sound samples for a simple view
Import modules
import pandas as pd
import numpy as np
import librosa
import librosa.display
import glob
import IPython.display as ipd
import random
%pylab inline
import warnings
warnings.filterwarnings('ignore')
pandas - used to perform data manipulation and analysis
numpy - used to perform a wide variety of mathematical operations on arrays
librosa - used to analyze music and sound files
librosa.display - used to display sound data as images
glob - used to find all pathnames matching a specific pattern
IPython.display - used to display and hear the audio
random - used for randomizing
%pylab inline - to enable the inline plotting
warnings - to manipulate warnings details
Loading the dataset
Now we load the dataset for processing
df = pd.read_csv('Urban Sound Dataset.csv')
df.head()
ID - Name of the audio file
Class - Name of the output class the audio file belongs to
Let us display an audio file
ipd.Audio('Train/1.wav')
Sound bar display of the audio file from the data
Exploratory Data Analysis
In this step we will visualize different audio sample of the data through wave plots.
We will load the audio file into an array
data, sampling_rate = librosa.load('Train/1.wav')
sampling_rate - number of splits or samples per second
Now we will view the data array
data
array([-0.09602016, -0.14303702, 0.05203498, ..., -0.01646687, -0.00915894, 0.09742922], dtype=float32)
Audio files loaded into values
Each value is a frequency value of the data
Next we will view the sampling rate
sampling_rate
22050
Output value determines the amount of samples per second
Now we plot some graphs of the audio files
plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)
figsize=(12,4) - size of the plot graph
librosa.display.waveplot() - display a waveplot of the data and sampling rate
index = random.choice(df.index)
print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')
plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)
Class: dog_bark
Randomly picked audio file to train
librosa.load('Train/'+str(df['ID'][index]) + '.wav') - creating a whole path for the data file and append the format
Graph display waveplot of a dog bark from the data
index = random.choice(df.index)
print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')
plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)
Class: gun_shot
Different audio data randomly picked
Graph display of a gun shot data sample
index = random.choice(df.index)
print('Class:', df['Class'][index])
data, sampling_rate = librosa.load('Train/'+str(df['ID'][index]) + '.wav')
plt.figure(figsize=(12,4))
librosa.display.waveplot(data, sr=sampling_rate)
Class: car_horn
Graph display of a car horn data sample
Now we will view the different class distribution in the data set
import seaborn as sns
plt.figure(figsize=(12,7))
sns.countplot(df['Class'])
seaborn - built on top of matplotlib with similar functionalities
Visualization through a bar graph for the no. of samples for each class.
Input Split
The data currently is in the audio file, we need to extract the audio into an array and convert the data as a sample to directly load the input and output data.
import os
def parser(row):
# path of the file
file_name = os.path.join('Train', str(row.ID) + '.wav')
# load the audio file
x, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
# extract features from the data
mfccs = np.mean(librosa.feature.mfcc(y=x, sr=sample_rate, n_mfcc=40).T, axis=0)
feature = mfccs
label = row.Class
return [feature, label]
import os - used to obtain and concatenate path directories
res_type='kaiser_fast' - used to extract the features very fast
librosa.feature.mfcc() - Mel-frequency cepstral coefficients technique to extract audio file features
feature - array of the features extracted form the data
label - name of the class of the extracted data
Now we will load the data for training
data = df.apply(parser, axis=1)
data.columns = ['feature','label']
Assigning the columns to display the features and the corresponding label of the data
data[0]
[array([-82.12358939, 139.50591598, -42.43086489, 24.82786139, -11.62076447, 23.49708426, -12.19458986, 25.89713885, -9.40527728, 21.21042898, -7.36882138, 14.25433903, -8.67870015, 7.75023765, -10.1241154 , 3.2581183 , -11.35261914, 2.80096779, -7.04601346, 3.91331351, -2.3349743 , 2.01242254, -2.79394367, 4.12927394, -1.62076864, 4.32620082, -1.03440959, -1.23297714, -3.11085341, 0.32044827, -1.787786 , 0.44295495, -1.79164752, -0.76361758, -1.24246428, -0.27664012, 0.65718559, -0.50237115, -2.60428533, -1.05346291]), 'siren']
List of data in a single array in the first index
Second index indicates the class
Now we split the data for better processing
# input split
X = np.array(list(zip(*data))[0])
y = np.array(list(zip(*data))[1])
Label encoder
We will transform the 10 class labels from text attributes to numerical attributes
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
le = LabelEncoder()
y = np_utils.to_categorical(le.fit_transform(y))
Each class converted into integer values in different categorical columns
y.shape
(5435, 10)
Shape of the data set for training, indicating 5435 samples of training data with 10 classes
y[0]
array([0., 0., 0., 0., 0., 0., 0., 0., 1., 0.], dtype=float32)
Single sample of the data in numerical columns of the classes
If the output class is present in the sample, it will change the corresponding numerical column to 1 and the rest to 0
Model Training
Let us create the model for training
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
num_classes = 10
# model creation
model = Sequential()
model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', metrics='accuracy', optimizer='adam')
Dense - single dimension linear layer
Dropout - used to add regularization to the data, avoiding over fitting & dropping out a fraction of the data
Activation - layer for the use of certain threshold
Flatten - convert a 2D array into a 1D array
Loss=’sparse_categorical_crossentropy’ - basic structure for the threshold to adjust the gradient descent
Optimizer=’adam’ - automatically adjust the learning rate for the model over the number of epochs
Now we will train the data
# train the model
model.fit(X, y, batch_size=32, epochs=100, validation_split=0.25)
Epoch 1/30
1149/1149 [==============================] - 10s 3ms/step - loss: 0.4816 - accuracy: 0.8475 - val_loss: 0.1202 - val_accuracy: 0.9637
Epoch 2/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.1336 - accuracy: 0.9605 - val_loss: 0.0848 - val_accuracy: 0.9743
Epoch 3/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0863 - accuracy: 0.9732 - val_loss: 0.0807 - val_accuracy: 0.9742
Epoch 4/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0685 - accuracy: 0.9783 - val_loss: 0.0734 - val_accuracy: 0.9788
Epoch 5/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0543 - accuracy: 0.9825 - val_loss: 0.0690 - val_accuracy: 0.9809
Epoch 6/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0461 - accuracy: 0.9844 - val_loss: 0.0684 - val_accuracy: 0.9808
Epoch 7/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0360 - accuracy: 0.9873 - val_loss: 0.0743 - val_accuracy: 0.9798
Epoch 8/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0318 - accuracy: 0.9884 - val_loss: 0.0733 - val_accuracy: 0.9811
Epoch 9/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0319 - accuracy: 0.9891 - val_loss: 0.0658 - val_accuracy: 0.9838
Epoch 10/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0242 - accuracy: 0.9919 - val_loss: 0.0728 - val_accuracy: 0.9827
Epoch 11/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0218 - accuracy: 0.9926 - val_loss: 0.0815 - val_accuracy: 0.9818
Epoch 12/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0286 - accuracy: 0.9895 - val_loss: 0.0766 - val_accuracy: 0.9829
Epoch 13/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0199 - accuracy: 0.9928 - val_loss: 0.0762 - val_accuracy: 0.9820
Epoch 14/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0239 - accuracy: 0.9918 - val_loss: 0.0754 - val_accuracy: 0.9836
Epoch 15/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0160 - accuracy: 0.9938 - val_loss: 0.0865 - val_accuracy: 0.9820
Epoch 16/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0196 - accuracy: 0.9935 - val_loss: 0.0842 - val_accuracy: 0.9822
Epoch 17/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0152 - accuracy: 0.9951 - val_loss: 0.0825 - val_accuracy: 0.9828
Epoch 18/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0155 - accuracy: 0.9943 - val_loss: 0.0889 - val_accuracy: 0.9817
Epoch 19/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0207 - accuracy: 0.9930 - val_loss: 0.0886 - val_accuracy: 0.9822
Epoch 20/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0122 - accuracy: 0.9955 - val_loss: 0.0958 - val_accuracy: 0.9822
Epoch 21/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0135 - accuracy: 0.9957 - val_loss: 0.0986 - val_accuracy: 0.9824
Epoch 22/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0166 - accuracy: 0.9949 - val_loss: 0.0987 - val_accuracy: 0.9824
Epoch 23/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0153 - accuracy: 0.9949 - val_loss: 0.0917 - val_accuracy: 0.9832
Epoch 24/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0147 - accuracy: 0.9950 - val_loss: 0.0967 - val_accuracy: 0.9838
Epoch 25/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0112 - accuracy: 0.9957 - val_loss: 0.1057 - val_accuracy: 0.9816
Epoch 26/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0134 - accuracy: 0.9959 - val_loss: 0.1024 - val_accuracy: 0.9830
Epoch 27/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0085 - accuracy: 0.9968 - val_loss: 0.1256 - val_accuracy: 0.9795
Epoch 28/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0127 - accuracy: 0.9958 - val_loss: 0.1099 - val_accuracy: 0.9832
Epoch 29/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0136 - accuracy: 0.9952 - val_loss: 0.1043 - val_accuracy: 0.9824
Epoch 30/30
1149/1149 [==============================] - 4s 3ms/step - loss: 0.0132 - accuracy: 0.9959 - val_loss: 0.1162 - val_accuracy: 0.9827
Display of the results after training the data
batch_size=32 - amount of data to process per iteration
epochs=30 - no. of iterations for training
validation_split=0.25 - train and split validation percentage
The training accuracy and validation accuracy increases per iteration
Both training and validation accuracy reached more than 90 percent
Final Thoughts
Deep learning models give more accuracy results compared to machine learning algorithms
Sound features are extracted and used for training
More training the data will get you better accuracy
This model can be reused differently depending on the data set and parameters, including speech recognition or other sound related tracks
In this project tutorial, we have explored the Urban Sound Analysis dataset as a classification project under deep learning. Different urban sounds were identified and classified with explanatory data analysis
Get the project notebook from here
Thanks for reading the article!!!
Check out more project videos from the YouTube channel Hackers Realm
Comments