top of page

Deep CNN Autoencoder for Image Compression & Denoising | Deep Learning | Python Tutorial

Writer's picture: Hackers RealmHackers Realm

Updated: May 31, 2023

An autoencoder is a type of unsupervised learning algorithm that aims to reconstruct its input data at the output layer, typically learns efficient data representations (encoding) by training the network to ignore signal “noise”. Autoencoders can be used for image denoising, image compression, data compression, anomaly detection, and feature extraction and, in some cases, even generation of image data.


A deep CNN autoencoder is a powerful approach for both image compression and denoising tasks. In this project tutorial we will explore how Deep CNN Autoencoder can be used for image compression and denoising.



In this project tutorial first we will see how autoencoder can be used for image compression


Deep CNN Autoencoder - Image Compression


For image compression, the deep CNN autoencoder learns to encode the important features of an input image into a compressed representation in the latent space. The encoding process reduces the dimensionality of the input image while retaining the essential information.

Deep CNN Autoencoder for Image Compression
Deep CNN Autoencoder for Image Compression


You can watch the video-based tutorial with a step-by-step explanation down below.


Flow of Autoencoder


Input Image -> Encoder -> Compressed Representation -> Decoder -> Reconstruct Input Image

  • The autoencoder takes an input data sample, here we have considered an image, and feeds it into the encoder network

  • The encoder network consists of several layers, typically including convolutional layers, pooling layers, and fully connected layers. These layers progressively reduce the spatial dimensions and extract meaningful features from the input data

  • The final layer of the encoder network produces a compressed representation of the input data

  • The compressed representation from the encoding stage is passed into the decoder network

  • The decoder network is symmetrical to the encoder network, consisting of fully connected layers, upsampling layers, and sometimes transposed convolutional layers. It takes the compressed representation and gradually increases the spatial dimensions to reconstruct the original input data

  • The final layer of the decoder network generates the reconstructed output, which aims to closely resemble the original input data


Import Modules


import numpy as np
import matplotlib.pyplot as plt
from keras import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.datasets import mnist
  • numpy - used to perform a wide variety of mathematical operations on arrays

  • matplotlib - used for data visualization and graphical plotting

  • keras - used to provide a user-friendly and intuitive interface for designing, training, and evaluating deep learning models

  • keras.layers - provides a variety of pre-defined layers that can be used to construct neural network models

  • keras.datasets - provides pre-loaded datasets that can be used for training, testing, and evaluating machine learning models


Load the Dataset


The project uses The MNIST handwritten digits dataset

(x_train, _), (x_test, _) = mnist.load_data()
  • mnist.load_data() loads the MNIST dataset


Preprocess the image Data


Next we will have to normalize the input image data

# normalize the image data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
  • astype('float32') converts the data type of the pixel values to float32. This step is performed to ensure compatibility with subsequent operations and to allow for decimal values

  • Then we will have to scale the pixel values by dividing them by 255. This step normalizes the pixel values to the range of 0 to 1, as the original pixel values are integers ranging from 0 to 255.

  • This normalization step is often applied to improve the training process and convergence of neural networks


Next we will reshape the input image data

# reshape in the input data for the model
x_train = x_train.reshape(len(x_train), 28, 28, 1)
x_test = x_test.reshape(len(x_test), 28, 28, 1)
x_test.shape

(10000, 28, 28, 1)

  • reshape() is a NumPy function that reshapes the array

  • No of samples is 10000

  • (28,28,1) is the dimensions of the image , the first two dimensions represent the spatial dimensions (height and width), and the last dimension represents the number of channels


Exploratory Data Analysis


Here we will explore how the input image looks like

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
  • np.random.randint() generates a random integer index within the range of the length of the x_test array. This is used to randomly select an image from the test dataset

  • imshow() displays the image. It takes the reshaped image as input.

  • x_test[index] retrieves the image at the randomly generated index from the test dataset.

  • reshape() reshapes the selected image back to its original 2D shape of 28x28 pixels. This is necessary because the image was flattened into a 1D array when it was stored in x_test.

  • plt.gray() sets the color map of the plot to grayscale, so the image is displayed in black and white

MNIST Sample Image
MNIST Sample Image

  • This is the random image from the MNIST test dataset displayed using matplotlib, with the pixel values reshaped into a 2D grid and plotted in grayscale.


Next let us see one more images from the dataset

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()

minst dataset
MNIST Sample Image
  • This is another random image from the MNIST test dataset displayed using matplotlib, with the pixel values reshaped into a 2D grid and plotted in grayscale


Model Creation


Next we will define a sequential model in Keras for a convolutional autoencoder

model = Sequential([
                    # encoder network
                    Conv2D(32, 3, activation='relu', padding='same', input_shape=(28, 28, 1)),
                    MaxPooling2D(2, padding='same'),
                    Conv2D(16, 3, activation='relu', padding='same'),
                    MaxPooling2D(2, padding='same'),
                    # decoder network
                    Conv2D(16, 3, activation='relu', padding='same'),
                    UpSampling2D(2),
                    Conv2D(32, 3, activation='relu', padding='same'),
                    UpSampling2D(2),
                    # output layer
                    Conv2D(1, 3, activation='sigmoid', padding='same')
])

model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()
  • Sequential([]) creates a new sequential model object

  • Inside the sequential model, a series of layers are added in order. The model architecture follows an encoder-decoder structure, typical of autoencoders

  • The encoder network consists of convolutional and pooling layers. The input shape of the first layer is specified as (28, 28, 1), indicating grayscale images of size 28x28 pixels

  • The decoder network consists of convolutional and upsampling layers

  • The final layer is the output layer, which uses a convolutional layer with a single channel and a sigmoid activation function to reconstruct the image

  • model.compile() compiles the model and configures the training process. The optimizer is set to 'adam', which is a popular optimization algorithm for neural networks. The loss function is set to 'binary_crossentropy', which is commonly used for binary classification problems

  • model.summary() prints a summary of the model architecture, including the number of parameters and the shape of each layer's output

autoencoder model configuration
Autoencoder Model Configuration
  • This is the overview of the model's structure and parameter counts


Training the Model


Next we will train the model

# train the model
model.fit(x_train, x_train, epochs=20, batch_size=256, validation_data=(x_test, x_test))
  • model.fit() is used to train the model

  • x_train is the input training data, and x_train is also used as the target output since it's an autoencoder (reconstructing the input)

  • epochs=20 specifies the number of times the entire training dataset will be iterated during training

  • batch_size=256 determines the number of samples used in each training update. In this case, 256 samples will be processed before updating the model's weights

  • validation_data=(x_test, x_test) is used to specify the validation data to evaluate the model's performance during training. Here, the same dataset (x_test) is used as both the input and target output

You will see the following result :

Training Steps of the model
Training Steps of the model


Visualize the results


First we will randomly select the image and display it

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()

mnist sample dataset
MNIST Sample Image
  • This is the random image from the MNIST test dataset displayed using matplotlib


Next we will predict the results from model

# predict the results from model (get compressed images)
pred = model.predict(x_test)
  • model.predict() is a method in Keras used to obtain predictions from a trained model

  • x_test is the input test data on which predictions will be made


Next we will visualize the compressed image obtained from model.predict()

# visualize compressed image
plt.imshow(pred[index].reshape(28,28))
plt.gray()
mnist compressed image from autoencoder model
Compressed Image from Autoencoder
  • This is the compressed image . We can clearly see that there is some difference between original image and the compressed image


We can create subplots which will display the original and predicted compressed image side by side , which helps to visualize the difference between both the images clearly

index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display compressed image
ax = plt.subplot(1, 2, 2)
plt.imshow(pred[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
  • First, np.random.randint() as seen earlier generates a random integer index within the range of the length of the x_test array.

  • plt.figure() creates a new figure object with a specified size of 10 inches wide and 4 inches tall. This sets the overall size of the plot

  • plt.subplot(1, 2, 1) creates a subplot grid with 1 row and 2 columns and selects the first subplot for displaying the original image

  • plt.imshow() displays the original image at the selected index from x_test. The reshape(28, 28) is used to reshape the flattened image back to its original 2D shape of 28x28 pixels.

  • plt.gray() as seen earlier sets the color map of the plot to grayscale.

  • ax.get_xaxis().set_visible() and ax.get_yaxis().set_visible() hide the x-axis and y-axis ticks as we set False, respectively, to remove the axis labels

  • plt.subplot(1, 2, 2) selects the second subplot for displaying the compressed/reconstructed image.

  • Next we will display the reconstructed image at the selected index from pred using plt.imshow()

  • plt.show() displays the figure with both subplots showing the original and reconstructed images

comparison between original and compressed image using autoencoder
Comparison between original and compressed image using autoencoder
  • This gives us better visualization where we can clearly see the difference between the original image and the predicted compressed image


We can check the results for one more image data to see the accuracy and performance of the model

index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display compressed image
ax = plt.subplot(1, 2, 2)
plt.imshow(pred[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
comparison between original and compressed image using autoencoder
Comparison between original and compressed image using autoencoder
  • This is one more example which shows the difference between the original image and the compressed image


Till now we have seen how autoencoders can be used for image compression. Next we will see how we can Denoise the image using autoencoders


Deep CNN Autoencoder - Denoising Image


The deep CNN autoencoder can be trained to remove noise from corrupted images. During the training process, the autoencoder is presented with pairs of clean and noisy images. The encoder network learns to extract meaningful features from the noisy images, while the decoder network reconstructs the clean version of the image from the encoded representation. By minimizing the difference between the reconstructed image and the clean image, the autoencoder learns to denoise the input images effectively.

Deep CNN Autoencoder for Denoising Image
Deep CNN Autoencoder for Denoising Image

You can watch the video-based tutorial with a step-by-step explanation down below.


Flow of Autoencoder


Noisy Image -> Encoder -> Compressed Representation -> Decoder -> Reconstruct Clear Image

  • The flow of the autoencoder is same as we have seen in the image compression , here instead of input image we will feed noisy image and at the end we will reconstruct clear image


Import Modules


import numpy as np
import matplotlib.pyplot as plt
from keras import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.datasets import mnist
  • We will be using same modules for denoising the image that we have seen in image compression


Load the Dataset


Next we will load the dataset

(x_train, _), (x_test, _) = mnist.load_data()

Preprocess the image Data


Next we will have to normalize the input image data

# normalize the image data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
  • This step normalizes the pixel values to the range of 0 to 1, as the original pixel values are integers ranging from 0 to 255.

  • This normalization step is often applied to improve the training process and convergence of neural networks


Next we will reshape the input image data

# reshape in the input data for the model
x_train = x_train.reshape(len(x_train), 28, 28, 1)
x_test = x_test.reshape(len(x_test), 28, 28, 1)
x_test.shape

(10000, 28, 28, 1)

  • No of samples is 10000

  • (28,28,1) is the dimensions of the image , the first two dimensions represent the spatial dimensions (height and width), and the last dimension represents the number of channels


Add Noise to the Image


Next we will have to add random noise to training and testing images

# add noise
noise_factor = 0.6
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape)
  • noise_factor is a scalar value that determines the intensity of the added noise. Higher values result in more pronounced noise

  • np.random.normal() generates random numbers from a normal distribution with a mean (loc) of 0.0 and a standard deviation (scale) of 1.0. The resulting array has the same shape as x_train or x_test

  • Next add the scaled random noise to the original x_train and x_test images, creating the noisy train and test set x_train_noisy and x_test_noisy respectively


Next we will limit the array values

# clip the values in the range of 0-1
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
  • np.clip() applies element-wise clipping to the x_train_noisy array and x_test_noisy array, limiting the values to the range of 0 to 1. Any values below 0 are set to 0, and any values above 1 are set to 1. This ensures that the pixel values remain within the valid range for image data


Exploratory Data Analysis


Here we will explore how the input noisy images looks like

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test[index].reshape(28,28))
plt.gray()
  • This displays random image from the test set using imshow() from matplotlib. The image will be shown in grayscale, with axis labels and color bar hidden by default. Each time the code is run, a different random image will be displayed due to the random selection of the index. This can be useful for visually inspecting individual images from the dataset.

Original MNIST Image
Original MNIST Image
  • This is the original image from the test dataset


Next we will display the same image with some noise

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test_noisy[index].reshape(28,28))
plt.gray()
MNIST Image with Added Noise
MNIST Image with Added Noise
  • We can clearly see that after adding noise it is hard identify the original image


Let us see some more images

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test_noisy[index].reshape(28,28))
plt.gray()
Noisy MNIST Image
MNIST Image with Added Noise
  • This is another noisy image from test dataset. By looking at this image it is hard to identify the original image


Let us see the original image

plt.imshow(x_test[index].reshape(28,28))
plt.gray()
minst image dataset
Original MNIST Image


Model Creation


Next we will define a sequential model in Keras for a convolutional autoencoder as we did during image compression

model = Sequential([
                    # encoder network
                    Conv2D(32, 3, activation='relu', padding='same', input_shape=(28, 28, 1)),
                    MaxPooling2D(2, padding='same'),
                    Conv2D(16, 3, activation='relu', padding='same'),
                    MaxPooling2D(2, padding='same'),
                    # decoder network
                    Conv2D(16, 3, activation='relu', padding='same'),
                    UpSampling2D(2),
                    Conv2D(32, 3, activation='relu', padding='same'),
                    UpSampling2D(2),
                    # output layer
                    Conv2D(1, 3, activation='sigmoid', padding='same')
])

model.compile(optimizer='adam', loss='binary_crossentropy')
model.summary()
autoencoder model config for denoising image
Autoencoder model config for denoising image
  • This is the overview of the model's structure and parameter counts


Training the model


Next we will train the model

# train the model
model.fit(x_train_noisy, x_train, epochs=20, batch_size=256, validation_data=(x_test_noisy, x_test))
  • This trains the model for 20 epochs with a batch size of 256, using the x_train_noisy dataset as input and the x_train dataset as the target. The validation data is provided using the x_test_noisy dataset as input and the x_test dataset as the target

You will see the following result :

Training Steps for model
Training Steps for model

Visualize the results


First we will randomly select the image and display it

# randomly select input image
index = np.random.randint(len(x_test))
# plot the image
plt.imshow(x_test_noisy[index].reshape(28,28))
plt.gray()
mnist image with random noise
MNIST Image with random noise
  • This is the random noisy image from the MNIST test dataset displayed using matplotlib


Next we will predict the results from model

# predict the results from model (get compressed images)
pred = model.predict(x_test_noisy)
  • x_test_noisy is the input test data on which predictions will be made


Next we will visualize the denoised image obtained from model.predict()

# visualize compressed image
plt.imshow(pred[index].reshape(28,28))
plt.gray()
denoised mnist image using autoencoder
Denoised MNIST image using Autoencoder
  • This is the denoised image . We can clearly see that the noise in the image has been removed and we have got a better image


We can create subplots which will display the original and predicted denoised image side by side , which helps to visualize the difference between both the images clearly

index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test_noisy[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display compressed image
ax = plt.subplot(1, 2, 2)
plt.imshow(pred[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
  • The above code snippet displays a comparison between an original image and its reconstructed (compressed) version

comparison between original noisy and denoised image using autoencoder
Comparison between original noisy and denoised image using Autoencoder
  • This gives us better visualization where we can clearly see the difference between the original image and the predicted denoised image


We can check the results for one more image data to see the accuracy and performance of the model

index = np.random.randint(len(x_test))
plt.figure(figsize=(10, 4))
# display original image
ax = plt.subplot(1, 2, 1)
plt.imshow(x_test_noisy[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display compressed image
ax = plt.subplot(1, 2, 2)
plt.imshow(pred[index].reshape(28,28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
comparison between original noisy and denoised image using autoencoder
Comparison between original noisy and denoised image using Autoencoder
  • This is one more example which shows the difference between the original image and the denoised image


Final Thoughts

  • A deep CNN autoencoder combines the power of CNNs for spatial feature extraction and the reconstruction capabilities of autoencoders to learn efficient representations of input data. It can be used for various applications where learning compact and meaningful representations is essential. Here we have seen the application of autoencoder for image compression and denoising

  • The advantage of using a deep CNN in the autoencoder architecture for image compression is that it can capture spatial dependencies and extract meaningful features from the input image. The convolutional layers in the encoder network perform local feature extraction, capturing fine details and patterns. The decoder network uses transposed convolutional layers to upsample the compressed representation and reconstruct the image with improved resolution. The deep CNN architecture helps preserve important image features during compression and ensures higher-quality reconstruction compared to traditional methods

  • Similar to image compression, the deep CNN architecture in the autoencoder is advantageous for image denoising as it can capture complex spatial patterns and extract hierarchical features. The convolutional layers in the network can identify noise patterns and suppress them, allowing the decoder network to reconstruct a cleaner version of the image

  • In summary, a deep CNN autoencoder is a powerful approach for both image compression and denoising tasks. It can learn efficient representations of images in the latent space, allowing for image compression with reduced memory or bandwidth requirements. Additionally, it can effectively remove noise from corrupted images, resulting in cleaner and higher-quality reconstructions

In this project tutorial, we have explored how Deep CNN Autoencoder can be used for image compression and denoising



Get the project notebook from here


Thanks for reading the article!!!


Check out more project videos from the YouTube channel Hackers Realm

bottom of page