ANN Model to Classify Images

6 min readOct 17, 2020

Introduction

In this guide we are going to create and train the neural network model to classify the clothing images. This is based on Basic classification tutorial from TensorFlow. We will use TensorFlow deep learning framework along with Keras high level API to build and train the model.

Import Libraries

Load Data

We are using Fashion MNIST dataset which contains 70,000 grayscale images in 10 categories.
We will use 60,000 images for training and 10,000 images for testing the model.
You can load the data directly from TensorFlow using fashion_mnist.load_data()
The images are 28x28 NumPy arrays, with pixel values ranging from 0 to 255. The labels are an array of integers, ranging from 0 to 9. These correspond to the class of clothing the image represents:

The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:

Exploratory Data Analysis

Let’s explore the format of the dataset before training the model. There are 60,000 images in the training set and 10,000 images in testing set, with each image represented as 28 x 28 pixels

Shape of train_images: (60000, 28, 28)
Shape of train_labels: (60000,)
Shape of test_images: (10000, 28, 28)
Shape of test_labels: (10000,)

There are total 10 labels from 0 to 9. Each representing a specific clothing image class. 0 (T-shirt/top), 1 (Trouser), 2 (Pullover), 3 (Dress), 4 (Coat), 5 (Sandal), 6(Shirt), 7 (Sneaker), 8 (Bag), 9 (Ankle boot)

Unique train labels: [0 1 2 3 4 5 6 7 8 9]
Unique test labels: [0 1 2 3 4 5 6 7 8 9]

Data Visualization

Preprocessing the Data

Scaling

Pixel values for each image, fall in the range of 0 to 255.
Typically, zero is taken to be black, and 255 is taken to be white. Values in between make up the different shades of gray.
In order to scale the input we are going to divide every value by 255 so that final values will be in the range of 0 to 1.
It’s important that the training set and the testing set be preprocessed in the same way.

Model Building

Building the neural network model requires configuring the input, hidden and output layers.

Set up the Layers

The basic building block of the neural network is the layer. Layers extract representation from the data fed into them.
Most times we have to chain multiple layers together to solve the problem.
The first layer in this network, tf.keras.layers.Flatten, transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels).
The input layer do not help in any kind of learning, it only reformats the data.
Once we have flattened input data, we can add dense hidden layers to the network. Here we are using two dense layers.
The first Dense layer has 128 nodes (or neurons) and using ‘relu’ activation function.
The second (and last) layer returns a logits array with length of 10. Each node contains a score that indicates the current image belongs to one of the 10 classes. Note that here we are not using any activation function, so by default it will be linear activation function.

Compile the Model

In this step we add all the required settings for the model training.
Loss Function: To measure models accuracy during training.
Optimizer: To update the model weights based on the input data and loss function output.
Metrics: Used to monitor the training the and testing steps

Train the Model

Steps involved in model training are as below

Feeding the training images and associated labels to the model.
Model learn the mapping of images and labels.
Then we ask model to perform predictions using test_images.
Verify the model predictions using test_labels.

Feed the Model

To start training, call themodel.fit called fit because it “fits” the model to the training data.
As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.91 (or 91%) on the training data.

Epoch 1/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.5038 - accuracy: 0.8237
Epoch 2/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3757 - accuracy: 0.8644
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3367 - accuracy: 0.8778
Epoch 4/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3126 - accuracy: 0.8855
Epoch 5/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2963 - accuracy: 0.8912
Epoch 6/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2800 - accuracy: 0.8979
Epoch 7/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2698 - accuracy: 0.8993
Epoch 8/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2593 - accuracy: 0.9041
Epoch 9/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2493 - accuracy: 0.9074
Epoch 10/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2394 - accuracy: 0.9105
29.8 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Model Accuracy

In this step we compare the model’s performance against test data

313/313 - 0s - loss: 0.3319 - accuracy: 0.8810

Test accuracy: 0.8809999823570251

As you can notice accuracy on the test dataset is less than the training dataset. This gap between accuracy represent overfitting. For more detail please refer.

Make Predictions

We can test the model’s accuracy on few images from test dataset.
But since our model is using the default ‘linear activation function’ we have to attach a softmax layer to convert the logits to probabilities, which are easier to interpret.

array([1.0144830e-08, 4.8488679e-14, 1.8175688e-11, 5.6300261e-13,
       3.1431319e-11, 1.5152204e-03, 1.1492748e-08, 3.7524022e-02,
       1.5029757e-07, 9.6096063e-01], dtype=float32)

Since we have 10 nodes in the last layer(one for each class of the image) we get 10 predictions for each image. Each number represents the confidence score for each class of image. We can choose the highest confidence score as final prediction of the model.

So the model predict that prediction image represent the 9th index class. class_names[9]-> ankle boot Let’s cross-check with true value from test_labels

Similarly to verify our predictions for other images, lets write functions that can return prediction, true label along with image.

Let's write a function that can plot a bar graph for each class prediction.

Let's try with some random sample and plot the results for verification.

As you can see from above result that our prediction for test example 12 is Sandal with confidence score of 83%. But the true label for this prediction is Sneaker. Remember that our models test accuracy is 88% means for 12% predictions will go wrong. In this case since Sandal and Sneaker looks a lot alike, this prediction went wrong. Note that the model can be wrong even when the prediction confidence score is very high!!

Now lets plot few more images and their predictions. We will use the below list for testing. test_list= [16, 17, 22, 23, 24, 25, 39, 40, 41, 42, 48, 49, 50, 51]

Using the Trained Model

By default, our model is optimized to make predictions on a batch, or collection of example at once.
We can also use the model to make prediction on single image

(28, 28)

(1, 28, 28)

Now predict the correct label for above image (with shape 1, 28, 28)

Probabilty for all classes: [[8.5143931e-03 1.0142570e-05 2.4879885e-01 1.4979002e-03 2.5186172e-02
  2.2455691e-09 7.1554321e-01 2.1525864e-11 4.4930150e-04 8.0325089e-09]], 
Best confidence score for class: 6

Now lets plot prediction and value array plot for above image.