In this guide we are going to create and train the neural network model to classify the clothing images. This is based on Basic classification tutorial from TensorFlow. We will use TensorFlow deep learning framework along with Keras high level API to build and train the model.
- We are using Fashion MNIST dataset which contains 70,000 grayscale images in 10 categories.
- We will use 60,000 images for training and 10,000 images for testing the model.
- You can load the data directly from TensorFlow using
- The images are 28x28 NumPy arrays, with pixel values ranging from 0 to 255. The labels are an array of integers, ranging from 0 to 9. These correspond to the class of clothing the image represents:
- The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:
Exploratory Data Analysis
Let’s explore the format of the dataset before training the model. There are 60,000 images in the training set and 10,000 images in testing set, with each image represented as 28 x 28 pixels
Shape of train_images: (60000, 28, 28)
Shape of train_labels: (60000,)
Shape of test_images: (10000, 28, 28)
Shape of test_labels: (10000,)
There are total 10 labels from 0 to 9. Each representing a specific clothing image class. 0 (T-shirt/top), 1 (Trouser), 2 (Pullover), 3 (Dress), 4 (Coat), 5 (Sandal), 6(Shirt), 7 (Sneaker), 8 (Bag), 9 (Ankle boot)
Unique train labels: [0 1 2 3 4 5 6 7 8 9]
Unique test labels: [0 1 2 3 4 5 6 7 8 9]
Preprocessing the Data
- Pixel values for each image, fall in the range of 0 to 255.
- Typically, zero is taken to be black, and 255 is taken to be white. Values in between make up the different shades of gray.
- In order to scale the input we are going to divide every value by 255 so that final values will be in the range of 0 to 1.
- It’s important that the training set and the testing set be preprocessed in the same way.
Building the neural network model requires configuring the input, hidden and output layers.
Set up the Layers
- The basic building block of the neural network is the layer. Layers extract representation from the data fed into them.
- Most times we have to chain multiple layers together to solve the problem.
- The first layer in this network,
tf.keras.layers.Flatten, transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels).
- The input layer do not help in any kind of learning, it only reformats the data.
- Once we have flattened input data, we can add dense hidden layers to the network. Here we are using two dense layers.
- The first Dense layer has 128 nodes (or neurons) and using ‘relu’ activation function.
- The second (and last) layer returns a logits array with length of 10. Each node contains a score that indicates the current image belongs to one of the 10 classes. Note that here we are not using any activation function, so by default it will be linear activation function.
Compile the Model
- In this step we add all the required settings for the model training.
- Loss Function: To measure models accuracy during training.
- Optimizer: To update the model weights based on the input data and loss function output.
- Metrics: Used to monitor the training the and testing steps
Train the Model
Steps involved in model training are as below
- Feeding the training images and associated labels to the model.
- Model learn the mapping of images and labels.
- Then we ask model to perform predictions using test_images.
- Verify the model predictions using test_labels.
Feed the Model
- To start training, call the
model.fitcalled fit because it “fits” the model to the training data.
- As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.91 (or 91%) on the training data.
1875/1875 [==============================] - 3s 2ms/step - loss: 0.5038 - accuracy: 0.8237
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3757 - accuracy: 0.8644
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3367 - accuracy: 0.8778
1875/1875 [==============================] - 3s 1ms/step - loss: 0.3126 - accuracy: 0.8855
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2963 - accuracy: 0.8912
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2800 - accuracy: 0.8979
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2698 - accuracy: 0.8993
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2593 - accuracy: 0.9041
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2493 - accuracy: 0.9074
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2394 - accuracy: 0.9105
29.8 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
In this step we compare the model’s performance against test data
313/313 - 0s - loss: 0.3319 - accuracy: 0.8810
Test accuracy: 0.8809999823570251
As you can notice accuracy on the test dataset is less than the training dataset. This gap between accuracy represent overfitting. For more detail please refer.
- We can test the model’s accuracy on few images from test dataset.
- But since our model is using the default ‘linear activation function’ we have to attach a softmax layer to convert the logits to probabilities, which are easier to interpret.
array([1.0144830e-08, 4.8488679e-14, 1.8175688e-11, 5.6300261e-13,
3.1431319e-11, 1.5152204e-03, 1.1492748e-08, 3.7524022e-02,
1.5029757e-07, 9.6096063e-01], dtype=float32)
Since we have 10 nodes in the last layer(one for each class of the image) we get 10 predictions for each image. Each number represents the confidence score for each class of image. We can choose the highest confidence score as final prediction of the model.
So the model predict that prediction image represent the 9th index class.
class_names-> ankle boot Let’s cross-check with true value from test_labels
Similarly to verify our predictions for other images, lets write functions that can return prediction, true label along with image.
Let's write a function that can plot a bar graph for each class prediction.
Let's try with some random sample and plot the results for verification.
As you can see from above result that our prediction for test example 12 is Sandal with confidence score of 83%. But the true label for this prediction is Sneaker. Remember that our models test accuracy is 88% means for 12% predictions will go wrong. In this case since Sandal and Sneaker looks a lot alike, this prediction went wrong. Note that the model can be wrong even when the prediction confidence score is very high!!
Now lets plot few more images and their predictions. We will use the below list for testing.
test_list= [16, 17, 22, 23, 24, 25, 39, 40, 41, 42, 48, 49, 50, 51]
Using the Trained Model
- By default, our model is optimized to make predictions on a batch, or collection of example at once.
- We can also use the model to make prediction on single image
(1, 28, 28)
Now predict the correct label for above image (with shape 1, 28, 28)
Probabilty for all classes: [[8.5143931e-03 1.0142570e-05 2.4879885e-01 1.4979002e-03 2.5186172e-02
2.2455691e-09 7.1554321e-01 2.1525864e-11 4.4930150e-04 8.0325089e-09]],
Best confidence score for class: 6
Now lets plot prediction and value array plot for above image.
For quick testing of above model you can also refer my Kaggle kernel