Digit recognition with Multi-layer perceptron (MLP) models

The MNIST (Modified National Institute of Standards and Technology) database is a large collection of handwritten digits as monochrome images. The digits have been size-normalized and centered in a fixed-size image.

The goal of this experiment is to find a set of hyperparameters that result in an accurate model and excellent model performance using GridSearchCV from Scikit-learn as a tunning technique.

Randomly I selected 3 optimizers (Adam, RMSprop, and SGD) as a starting point to develop the MLP models; later I tuned the 3 different models with their own hyperparameters and chose the model that predicted with higher accuracy the digits.

I created my own database with 20 samples and I preproced the images trying to simulate the original set to predict digits with the selected model.

In the end, the model showed high accuracy on the test set but the accuracy decreased with the custom dataset.

The problem with MNIST is that the dataset is "too perfect"; in real life, we have to deal with lights and shadows in images, variations in the way people draw a digit, noise (shapes that are not part of the actual digit), no-centered draws, and more consideration. A digit recognition model that only works on its own dataset wouldn't be that interesting but is good enough if you are new in this field and if your goal is to learn, practice, and get familiar with different machine learning tools.

The complete Jupyter Notebook can be found in my github: here

1. Dataset¶

MNIST is a dataset of $60,000$ grayscale images($28x28$ $pixels$) of the $10$ digits ($0-9$), along with a test set of $10,000$ images

The digits have been size-normalized and centered in a fixed-size image of $28x28$ pixels.

Every pixel in an image will be treated as the input of the Multilayer perceptron (MLP) model and the output will be the one-hot encoding array of the 10-digit

References:¶

Yefeng Xia. (August 19,2020). From MNIST to the real-world, why the trained CNN model not works?
Engati Simply Intelligence. (January ,2021). MNIST Dataset
Keras.io.MNIST digits classification dataset
Mostafa Ibrahim. (March 13, 2024).A Deep Dive Into Learning Curves in Machine Learning
Jason Brownlee. (August 6, 2019). How to use Learning Curves to Diagnose Machine Learning Model Performance

2. Loading dataset¶

In [1]:

from keras.datasets import mnist
from sklearn.model_selection import train_test_split

# use Keras to import pre-shuffled MNIST database
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# extract validation dataset
X_train, X_validation, y_train, y_validation = train_test_split(
    X_train, 
    y_train, 
    test_size=0.2, 
    random_state=42)

print("The MNIST database has a training set of %d examples." % len(X_train))
print("The MNIST database has a validation set of %d examples." % len(X_validation))
print("The MNIST database has a test set of %d examples." % len(X_test))

The MNIST database has a training set of 48000 examples.
The MNIST database has a validation set of 12000 examples.
The MNIST database has a test set of 10000 examples.

In [2]:

print("shape of the images:",X_train[0].shape)
print("label of the first image",y_train[0])

shape of the images: (28, 28)
label of the first image 5

3. Visualize dataset¶

In [3]:

import matplotlib.pyplot as plt

# plot the first 3 images:
fig = plt.figure(figsize=(15,15))
for i in range(3):
    ax = fig.add_subplot(1,3,i+1)
    ax.imshow(X_train[i], cmap='gray')
    ax.set_title("label: "+str(y_train[i]))

In [4]:

import numpy as np
section = X_train[0][10:20,10:20]
print("image section:\n",section,"\n")
print("pixel maximum value:", np.max(X_train[0]))
print("pixel minimum value:", np.min(X_train[0]))

image section:
 [[ 21   0   0   0   0   0   0   0   0   0]
 [223 223 193  71   6   0   0   0   0   0]
 [253 253 253 253 196 121   0   0   0   0]
 [144 144 217 251 253 253 170   4   0   0]
 [  0   0   0  53 236 253 253 215   3   0]
 [  0   0   0   0  34 180 253 253 128   0]
 [  0   0   0   0   0   2 140 253 236  36]
 [  0   0   0   0   0   0  13 215 253  62]
 [  0   0   0   0   0   0   0 105 253  62]
 [  0   0   0   0   0   0   0  99 253  62]] 

pixel maximum value: 255
pixel minimum value: 0

4. Data preprocessing¶

Normalize images:¶

"When using the image as it is and passing through a Deep Neural Network, the computation of high numeric values may become more complex. To reduce this we can normalize the values to range from 0 to 1. In this way, the numbers will be small and the computation becomes easier and faster." Asha Ponraj.(Feb19,2021), A Tip A Day — Python Tip #8

In [5]:

X_train = X_train/255
X_validation = X_validation/255
X_test = X_test/255

Encode the labels:¶

The dataset labels are categorical variables (digits from 0 through 9). We need to encode these values before feeding them to a neural network. Since there are very few categories we can use one-hot encoding.

To represent a given label, one-hot encoding creates a vector of length equal to the total number of categories (in this case 10), Then, in order to represent a given label, the corresponding element of the encoding vector is set to 1, and all other elements to 0 (for example [0 0 0 0 0 1 0 0 0 0] is iqual to 5)

In [6]:

from keras.utils.np_utils import to_categorical

print ("integer representation of first 5 labels:\n",y_train[0:5])

y_train = to_categorical(y_train,10)
y_validation = to_categorical(y_validation,10)
y_test = to_categorical(y_test,10)

print("one-hot representation of first 5 labels:\n",y_train[0:5])

integer representation of first 5 labels:
 [5 0 1 6 1]
one-hot representation of first 5 labels:
 [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]

5. Model Architecture¶

Requeriments:¶

scikeras makes it possible to use keras with scikit learn. In this particular case, I'm going to use GridSearchCV from scikit learn for model tuning and return the hyperparameters that adjust the most to the neural network model.

if you already have Keras and TensorFlow, install scikeras with no dependencies:

pip install --no-deps scikeras

Documentation available at: https://adriangb.com/scikeras/stable/index.html

Reproducibility:¶

In a reproducible model, the weights of the model should be initialized with same values in subsequent runs, for experimentation purposes or to debug a problem.

More about reproducibility in keras:

Design:¶

In [7]:

# To use CPU/GPU in training process
import tensorflow as tf

# Data analysis
import pandas as pd

# Neural network 
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten

# Optimizers
from keras.optimizers import Adam, SGD, RMSprop

# Grid Search
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import GridSearchCV

# Prevent overfitting in training process for epochs > 2
from keras.callbacks import EarlyStopping

# Save the best weights in a checkpoint file 
from keras.callbacks import ModelCheckpoint 

In [8]:

# reproducibility

# If using TensorFlow, this will make GPU ops as deterministic as possible,
# but it will affect the overall performance, so be mindful of that.
tf.config.experimental.enable_op_determinism()

# to control randomness
keras.utils.set_random_seed(42)

# control randomness, used with RELU activation function
he = keras.initializers.he_normal(seed=42)

# control randomness, used with softmax activation function
glorot = keras.initializers.glorot_normal(seed=42)

# control randomness, used with bias 
zeros = keras.initializers.zeros()
ones = keras.initializers.ones()

Every pixel in an image will be treated as the input of the Multilayer perceptron (MLP) model; the output or the prediction is encoded with one-hot algorithm and return a single array of lenght 10, where the position with the highest value represent the predicted digit.

In [9]:

def model(unitsHL1, unitsHL2, 
          dropoutHL1, dropoutHL2, 
          optimizer_learning_rate, 
          optimizer_momentum,
          optimizer='Adam'):
    model = Sequential()
    # input layer
    # flattern convert image into an array
    model.add(Flatten(input_shape=X_train.shape[1:]))
    # hidden layer 1
    model.add(Dense(unitsHL1, 
                    activation='relu', 
                    kernel_initializer=he,
                    bias_initializer=ones,
                   ))
    # regularization
    if dropoutHL1 > 0:
        model.add(Dropout(dropoutHL1))
    # hidden layer 2
    model.add(Dense(unitsHL2, 
                    activation='relu', 
                    kernel_initializer=he,
                    bias_initializer=ones,
                    ))
    # regularization
    if dropoutHL2 > 0:
        model.add(Dropout(dropoutHL2))
    # output layer
    model.add(Dense(10, 
                    activation='softmax', 
                    kernel_initializer=glorot,
                    bias_initializer=ones,
                   ))
    # optimizer
    if optimizer == 'Adam':
        opt = Adam(
            learning_rate=optimizer_learning_rate
        )
    elif optimizer == 'SGD':
        opt = SGD(
            learning_rate=optimizer_learning_rate,
            momentum=optimizer_momentum,
        )
    elif optimizer == 'RMSprop':
        opt = RMSprop(
            learning_rate=optimizer_learning_rate, 
            momentum=optimizer_momentum,
        )
    model.compile(
        loss='categorical_crossentropy', 
        optimizer = opt,
        metrics=['accuracy'],
    )
    return model

In [10]:

def model_train_results(models,epochs,batch_size,X,Y,X_val,Y_val,callbacks,verbose=0): 
    '''
    Save the history,epochs,loss and accuracy of a trained model into a dictionary
    Input: 
        - models(array):
        - epochs(array): array integer or single integer
        - batch_size(array): array integer or single integer
        - earlystopping(None): callback,
        - X,Y,X_val,Y_val(array): train and validation set
        - verbose (int) : 0,1 or 2 show training process details.
    Output:
        - results(dic):
    '''
    results = {}
    for index,model in enumerate(models):
        results[model.name] = {}
        # check for single epoch or epoch list
        if isinstance(epochs,(int,float)):
            n_epochs = epochs
        elif isinstance(epochs,list):
            n_epochs = epochs[index]
        elif len(epochs) != len(models):
            print(f"epoch missing value, {len(epochs)} values was given but need {len(models)}")
            break
        else:
            print("epoch value error")
            break
        # check for single batch size or batch size list
        if isinstance(batch_size,(int,float)):
            n_batch_size = batch_size
        elif isinstance(batch_size,list):
            n_batch_size = batch_size[index]
            
        elif len(batch_size) != len(models):
            print(f"batch size missing value,{len(batch_size)} values was given but need {len(models)}")
            break
        else:
            print("batch size value error")
            break
        # model fit with train and validation set
        results[model.name]['hist'] = model.fit(
            x = X,
            y = Y,
            batch_size = n_batch_size,
            epochs = n_epochs,
            verbose = verbose,
            callbacks=callbacks,
            validation_data=(X_val,Y_val),
            shuffle=True,
        )
        history = results[model.name]['hist']
        # stopped epochs
        if (callbacks and earlystopping.stopped_epoch > 0 ):
            stopepoch = earlystopping.stopped_epoch
            results[model.name]['stop epochs'] = stopepoch
            # validation loss and accuracy at stopped epoch
            results[model.name]['val_loss'] = history.history['val_loss'][stopepoch]
            results[model.name]['val_accuracy'] = history.history['val_accuracy'][stopepoch]
        else:
            results[model.name]['stop epochs'] = n_epochs
            # model calculate with all the epochs
            results[model.name]['val_loss'] = history.history['val_loss'][n_epochs-1]
            results[model.name]['val_accuracy'] = history.history['val_accuracy'][n_epochs-1]
    
    return results

In [11]:

def plot_train_results(results):
    '''
    input: 
        - results(dic) : model training results saved in a dictionary
                        {'model':{'key1':'result1','key2':'result2'...}}
    output
        - plot: loss(left plot), accuracy(right plot)
    '''
    n_plots = len(list(results.keys()))
    if n_plots == 1:
        fig,axes = plt.subplots(n_plots,2,figsize=(13,5))
        # Convert to 2D array
        axes = axes.reshape(1, -1)
    else:
        fig,axes = plt.subplots(n_plots,2,figsize=(13,13))
    for index,model in enumerate(list(results.keys())):
        history = results[model]['hist']
        # loss
        axes[index,0].plot(history.history['val_loss'])
        axes[index,0].plot(history.history['loss'])
        axes[index,0].set_title(model + ' loss')
        axes[index,0].set_xlabel('Epoch')
        axes[index,0].set_ylabel('Loss')
        axes[index,0].legend(['val_loss','loss'], loc='upper left')
        # accuracy
        axes[index,1].plot(history.history['val_accuracy'])
        axes[index,1].plot(history.history['accuracy'])
        axes[index,1].set_title(model + ' accuracy')
        axes[index,1].set_xlabel('Epoch')
        axes[index,1].set_ylabel('Accuracy')
        axes[index,1].legend(['val_accuracy','accuracy'], loc='upper left')
    plt.tight_layout()    
    plt.show()

Grid search tuning¶

The code below was executed several times until found parameters that returned a good performance

Grid search : Layers and Adam optimizer¶

In [17]:

# patience: number of epochs with no improvement before stopping
# min: training will stop when the quantity monitored has stopped decreasing
# monitor: "val_loss" measure loss on validation set 
earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 4,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

adam_estimator = KerasClassifier(
    model,
    unitsHL1=550,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.001,
    optimizer_momentum = 0,
    optimizer='Adam',
    callbacks= earlystopping,
    epochs=20,
)
adam_param_grid = {
    'unitsHL1':[550,750],
    'unitsHL2':[200,400],
    'optimizer_learning_rate':[0.001,0.01],
}
adam_grid = GridSearchCV(estimator=adam_estimator, param_grid=adam_param_grid)

In [18]:

with tf.device('GPU:0'):
    adam_grid_result = adam_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,
    )

In [19]:

print("Best score: %f using %s" % (adam_grid_result.best_score_, adam_grid_result.best_params_))

Best score: 0.974958 using {'optimizer_learning_rate': 0.001, 'unitsHL1': 750, 'unitsHL2': 200}

Grid search : Layers and RMSprop optimizer¶

In [10]:

# patience: number of epochs with no improvement before stopping
# min: training will stop when the quantity monitored has stopped decreasing
# monitor: "val_loss" measure loss on validation set 
earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 5,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

RMSprop_estimator = KerasClassifier(
    model,
    unitsHL1=550,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.001,
    optimizer_momentum = 0.0,
    optimizer='RMSprop',
    callbacks= earlystopping,
    epochs=20,
)
RMSprop_param_grid = {
    'unitsHL1':[550,750,850],
    'unitsHL2':[200,400],
    'optimizer_learning_rate':[0.001,0.01,0.1],
    'optimizer_momentum':[0.0,0.001,0.01,0.1],
    
}
RMSprop_grid = GridSearchCV(
    estimator=RMSprop_estimator, 
    param_grid=RMSprop_param_grid
)

with tf.device('GPU:0'):
    RMSprop_grid_result = RMSprop_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,
    )

In [12]:

print("Best score: %f using %s" % (RMSprop_grid_result.best_score_, RMSprop_grid_result.best_params_))

Best score: 0.974958 using {'optimizer_learning_rate': 0.001, 'optimizer_momentum': 0.0, 'unitsHL1': 750, 'unitsHL2': 200}

Grid search : Layers and SGD optimizer¶

In [13]:

# patience: number of epochs with no improvement before stopping
# min: training will stop when the quantity monitored has stopped decreasing
# monitor: "val_loss" measure loss on validation set 
earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 3,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

SGD_estimator = KerasClassifier(
    model,
    unitsHL1=550,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.001,
    optimizer_momentum = 0.0,
    optimizer='SGD',
    callbacks= earlystopping,
    epochs=20,

)
SGD_param_grid = {
    'unitsHL1':[550,750,850],
    'unitsHL2':[200,400],
    'optimizer_learning_rate':[0.001,0.01,0.1],
    'optimizer_momentum':[0.0,0.001,0.01],
}
SGD_grid = GridSearchCV(
    estimator=SGD_estimator, 
    param_grid=SGD_param_grid,
    cv = 3,
)

In [14]:

with tf.device('GPU:0'):
    SGD_grid_result = SGD_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,
    )

In [15]:

print("Best score: %f using %s" % (SGD_grid_result.best_score_, SGD_grid_result.best_params_))

Best score: 0.974021 using {'optimizer_learning_rate': 0.001, 'optimizer_momentum': 0.0, 'unitsHL1': 750, 'unitsHL2': 400}

Grid search results : Layers and optimizer¶

Every Grid search was set to epochs = 20 with earlystopping callback to stop training when the val_loss metric stops improving.

The batch_size = 32 is the default if no value is specified.

In [12]:

# training results

# model : optimizer adam
adam_model = model(
    unitsHL1 = 750, 
    unitsHL2 = 200, 
    dropoutHL1 = 0,
    dropoutHL2 = 0, 
    optimizer_learning_rate = 0.001, 
    optimizer_momentum = 0.0,
    optimizer ='Adam',
)
adam_model._name = 'adam_lr2'
# model : optimizer RMSprop
rmsprop_model = model(
    unitsHL1 = 750, 
    unitsHL2 = 200, 
    dropoutHL1 = 0,
    dropoutHL2 = 0, 
    optimizer_learning_rate = 0.001, 
    optimizer_momentum = 0.0,
    optimizer='RMSprop',
)
rmsprop_model._name = 'rmsprop_lr2'
# model: optimizer SGD
sgd_model = model(
    unitsHL1 =750, 
    unitsHL2 =400, 
    dropoutHL1 = 0,
    dropoutHL2 = 0, 
    optimizer_learning_rate =0.001, 
    optimizer_momentum =0.0,
    optimizer='SGD',
)
sgd_model._name = 'sgd_lr2'

models = [adam_model,rmsprop_model,sgd_model]

# stop training if no improvement
earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 4,
                                mode = 'min',
                                restore_best_weights = True,
                                verbose = 0)
callbacks = [earlystopping]

train_results = model_train_results(
    models=models,
    epochs=20,
    batch_size=32,
    X= X_train,
    Y= y_train,
    X_val = X_validation,
    Y_val = y_validation,
    callbacks = callbacks,
    verbose = 1,
    
)

In [13]:

plot_train_results(train_results)

In [15]:

for model in models:
    print(f"model {model.name} validation loss: {train_results[model.name]['val_loss']}")
    print(f"model {model.name} validation accuracy: {train_results[model.name]['val_accuracy']}")
    print('\n')

model adam_lr2 validation loss: 0.11157926172018051
model adam_lr2 validation accuracy: 0.9767500162124634


model rmsprop_lr2 validation loss: 0.16329512000083923
model rmsprop_lr2 validation accuracy: 0.9752500057220459


model sgd_lr2 validation loss: 0.2640928328037262
model sgd_lr2 validation accuracy: 0.9260833263397217

The models with optimizer Adam and RMSprop show bad performance, models that are good in training but have bad predicting (validation data).

The models show a situation where the training loss (loss) curves are low and the validation loss (val_loss) curves are erratic and increase, which indicates overfitting.

The model with optimizer SGD in general has a good performance in training and validation.

A way to optimize the model performance with overfitting is:

Reduce the training time with a higher batch size
Use regularization as dropout or L1 or L2 techniques
Use earlystopping to stop the training at a certain epoch number

Model Convergence:¶

The accuracy is not enough to select the right model, the number of epochs plays a significant role in determining the model convergence and performance.

A way to find an optimal number of epochs is using early stopping, this regularization technique stops the training as soon as the validation error reaches a minimum preventing overfitting. Determining an appropriate number of epochs also helps manage computational resources effectively by avoiding unnecessary training iterations.

The batch size hyperparameter also has a significant impact on the model performance and training time. In practice models with a high batch size are not generalized as well as models with a low batch size.

Convergence tells us that the model has understood the patterns in the data and is making accurate predictions.

During the training of a machine learning model, the current state of the model at each step of the training algorithm can be evaluated. It can be evaluated on the training dataset to give an idea of how well the model is “learning.” It can also be evaluated on a hold-out validation dataset that is not part of the training dataset. Evaluation on the validation dataset gives an idea of how well the model is “generalizing.” Jason Browlee(Aug 6,2019). How to use Learning Curves to Diagnose Machine Learning Model Performance

NOTE

The optimal learning rate depends on the other hyperparameters— especially the batch size—so if you modify any hyperparameter, make sure to update the learning rate as well. Aurélien Géron - Hands-on Machine Learning with scikit-learn,keras & tensorflow

Model Convergence: Adam¶

In [82]:

earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 4,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

adam_estimator = KerasClassifier(
    model,
    unitsHL1=750,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.001,
    optimizer_momentum = 0.0,
    optimizer='Adam',
    callbacks=[earlystopping],
    epochs=24,
)
adam_param_grid = {
    'optimizer_learning_rate':[0.001,0.0015,0.002],
    'batch_size':[275,300,325],
}
adam_grid = GridSearchCV(
    estimator=adam_estimator, 
    param_grid=adam_param_grid,
    cv=4, # cross-validation default 5-fold
)

In [83]:

with tf.device('GPU:0'):
    adam_grid_result = adam_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,   
    )

In [84]:

print("Best score: %f using %s" % (adam_grid_result.best_score_, adam_grid_result.best_params_))

Best score: 0.973729 using {'batch_size': 300, 'optimizer_learning_rate': 0.0015}

In [91]:

print("best estimator stopped epoch:",adam_grid_result.best_estimator_.current_epoch)

best estimator stopped epoch: 13

Model Convergence: RMSprop¶

In [101]:

earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 5,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

rmsprop_estimator = KerasClassifier(
    model,
    unitsHL1=750,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.001,
    optimizer_momentum = 0.0,
    optimizer='RMSprop',
    callbacks=[earlystopping],
    epochs=25,
)
rmsprop_param_grid = {
    'optimizer_learning_rate':[0.001,0.0015,0.002],
    'optimizer_momentum':[0.0,0.001],
    'batch_size':[350,400,450]
}
rmsprop_grid = GridSearchCV(
    estimator=rmsprop_estimator, 
    param_grid=rmsprop_param_grid,
    cv=4, # cross-validation default 5-fold
)

In [102]:

with tf.device('GPU:0'):
    rmsprop_grid_result = rmsprop_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,   
)

In [103]:

print("Best score: %f using %s" % (rmsprop_grid_result.best_score_, rmsprop_grid_result.best_params_))

Best score: 0.974313 using {'batch_size': 400, 'optimizer_learning_rate': 0.0015, 'optimizer_momentum': 0.0}

In [104]:

print("best estimator stopped epoch:",rmsprop_grid_result.best_estimator_.current_epoch)

best estimator stopped epoch: 20

Model Convergence: SGD¶

In [16]:

earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 5,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

sgd_estimator = KerasClassifier(
    model,
    unitsHL1=750,
    unitsHL2=400,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.001,
    optimizer_momentum = 0.0,
    optimizer='SGD',
    callbacks=[earlystopping],
    epochs=30,
)
sgd_param_grid = {
    'optimizer_learning_rate':[0.001,0.0015,0.002],
    'batch_size':[100,200,300]
}
sgd_grid = GridSearchCV(
    estimator=sgd_estimator, 
    param_grid=sgd_param_grid,
    cv=4, # cross-validation default 5-fold
)

In [17]:

with tf.device('GPU:0'):
    sgd_grid_result = sgd_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,   
)

In [18]:

print("Best score: %f using %s" % (sgd_grid_result.best_score_, sgd_grid_result.best_params_))

Best score: 0.974250 using {'batch_size': 200, 'optimizer_learning_rate': 0.0015}

In [19]:

print("best estimator stopped epoch:",sgd_grid_result.best_estimator_.current_epoch)

best estimator stopped epoch: 11

Models convergence results¶

In [13]:

# training convergence results
# model convergence with optimizer adam
adam_model_convg = model(
    unitsHL1 = 750, 
    unitsHL2 = 200, 
    dropoutHL1 = 0.0,
    dropoutHL2 = 0.0, 
    optimizer_learning_rate = 0.0015, 
    optimizer_momentum = 0.0,
    optimizer ='Adam',
)
adam_model_convg._name = 'adam_convg'

# model convergence with optimizer RMSprop
rmsprop_model_convg = model(
    unitsHL1 = 750, 
    unitsHL2 = 200, 
    dropoutHL1 = 0.0,
    dropoutHL2 = 0.0, 
    optimizer_learning_rate = 0.0015, 
    optimizer_momentum = 0.0,
    optimizer ='RMSprop',
)
rmsprop_model_convg._name = 'rmsprop_convg'

# model convergence with optimizer SGD
sgd_model_convg = model(
    unitsHL1 = 750, 
    unitsHL2 = 400, 
    dropoutHL1 = 0.0,
    dropoutHL2 = 0.0, 
    optimizer_learning_rate = 0.002, 
    optimizer_momentum = 0.0,
    optimizer ='SGD',
)
sgd_model_convg._name = 'sgd_convg'


# models list
models_list = [adam_model_convg,rmsprop_model_convg,sgd_model_convg]
# epochs list in order: [adam,rmsprop,sgd]
models_epochs_list = [13,20,14]
# batch size list in order: [adam,rmsprop,sgd]
models_batch_size_list = [300,400,375]

In [14]:

models_convg_results = model_train_results(
    models = models_list,
    epochs = models_epochs_list,
    batch_size = models_batch_size_list,
    X = X_train,
    Y = y_train,
    X_val = X_validation,
    Y_val = y_validation,
    callbacks=None,
    verbose = 1
) 

In [15]:

plot_train_results(models_convg_results)

In [16]:

for model in models_list:
    print(f"model {model.name} validation loss: {models_convg_results[model.name]['val_loss']}")
    print(f"model {model.name} validation accuracy: {models_convg_results[model.name]['val_accuracy']}")
    print('\n')

model adam_convg validation loss: 0.08512312173843384
model adam_convg validation accuracy: 0.9765833616256714


model rmsprop_convg validation loss: 0.12291974574327469
model rmsprop_convg validation accuracy: 0.9830833077430725


model sgd_convg validation loss: 0.425358384847641
model sgd_convg validation accuracy: 0.8804166913032532

The models improved performance after finding a suitable batch size number and using early stop to monitor validation loss when the metric stopped to improve (to avoid overfitting), which resulted in smoother convergence and less variability(less erratic curves).

Also, I had to slightly tune the learning rate of every model due to the other hyperparameter modifications.

Models with Adam and RMSprop optimizers continue having signs of overfitting due to the validation loss value being higher than the training loss.

The model with SGD optimizer converges at a certain point and has no signs of overfitting but has the low accuracy of the models.

Dropout Regularization:¶

model dropout with 'adam' optimizer¶

In [229]:

earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 4,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

adam_estimator = KerasClassifier(
    model,
    unitsHL1=750,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.0015,
    optimizer_momentum = 0.0,
    optimizer='Adam',
    callbacks=[earlystopping],
    epochs=30,
    batch_size=300,
)
adam_param_grid = {
    #'optimizer_learning_rate':[0.001,0.0015,0.002],
    #'batch_size':[200,300,400],
    'dropoutHL1':[0.6],
    'dropoutHL2':[0.3,0.4],
}
adam_grid = GridSearchCV(
    estimator=adam_estimator, 
    param_grid=adam_param_grid,
    cv=4, # cross-validation default 5-fold
)

In [230]:

with tf.device('GPU:0'):
    adam_grid_result = adam_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,   
)

In [242]:

print("Best score: %f using %s" % (adam_grid_result.best_score_, adam_grid_result.best_params_))

Best score: 0.978646 using {'dropoutHL1': 0.6, 'dropoutHL2': 0.3}

In [243]:

print("best estimator stopped epoch:",adam_grid_result.best_estimator_.current_epoch)

best estimator stopped epoch: 19

model dropout with 'RMSprop' optimizer¶

In [20]:

earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 4,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

rmsprop_estimator = KerasClassifier(
    model,
    unitsHL1=750,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.0015,
    optimizer_momentum = 0.0,
    optimizer='RMSprop',
    callbacks=[earlystopping],
    epochs=25,
    batch_size=400,
)
rmsprop_param_grid = {
    #'optimizer_learning_rate':[0.001,0.0015,0.002],
    #'batch_size':[300,400,500],
    'dropoutHL1':[0.5,0.6],
    'dropoutHL2':[0.0,0.1,0.2],
}
rmsprop_grid = GridSearchCV(
    estimator=rmsprop_estimator, 
    param_grid=rmsprop_param_grid,
    cv=4, # cross-validation default 5-fold
)

In [21]:

with tf.device('GPU:0'):
    rmsprop_grid_result = rmsprop_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,   
)

In [22]:

print("Best score: %f using %s" % (rmsprop_grid_result.best_score_, rmsprop_grid_result.best_params_))

Best score: 0.977354 using {'dropoutHL1': 0.5, 'dropoutHL2': 0.1}

In [23]:

print("best estimator stopped epoch:",rmsprop_grid_result.best_estimator_.current_epoch)

best estimator stopped epoch: 18

model dropout with 'SGD' optimizer¶

In [16]:

earlystopping = EarlyStopping(monitor = "val_loss",
                                patience = 4,
                                mode = 'auto',
                                restore_best_weights = True,
                                verbose = 1)

sgd_estimator = KerasClassifier(
    model,
    unitsHL1=750,
    unitsHL2=200,
    dropoutHL1 = 0,
    dropoutHL2 = 0,
    optimizer_learning_rate=0.002,
    optimizer_momentum = 0.0,
    optimizer='SGD',
    callbacks=[earlystopping],
    epochs=25,
    batch_size=375,
)
sgd_param_grid = {
    #'optimizer_learning_rate':[0.001,0.0015,0.002],
    #'batch_size':[200,300,400],
    'dropoutHL1':[0.0,0.1,0,0.2],
    'dropoutHL2':[0.0,0.1,0.2],
}
sgd_grid = GridSearchCV(
    estimator=sgd_estimator, 
    param_grid=sgd_param_grid,
    cv=4, # cross-validation default 5-fold
)

In [17]:

with tf.device('GPU:0'):
    sgd_grid_result = sgd_grid.fit(
        X= X_train, 
        y= y_train,
        validation_data=(X_validation,y_validation),
        verbose=1,   
)

In [18]:

print("Best score: %f using %s" % (sgd_grid_result.best_score_, sgd_grid_result.best_params_))

Best score: 0.975833 using {'dropoutHL1': 0.1, 'dropoutHL2': 0.2}

In [19]:

print("best estimator stopped epoch:",sgd_grid_result.best_estimator_.current_epoch)

best estimator stopped epoch: 13

models regularization dropout results¶

In [12]:

# training regularization results

# model dropout with optimizer adam
adam_model_dropout = model(
    unitsHL1 = 750, 
    unitsHL2 = 200, 
    dropoutHL1 = 0.6,
    dropoutHL2 = 0.3, 
    optimizer_learning_rate = 0.0015, 
    optimizer_momentum = 0.0,
    optimizer ='Adam',
)
adam_model_dropout._name = 'adam_dropout'

# model dropout with optimizer RMSprop
rmsprop_model_dropout = model(
    unitsHL1 = 750, 
    unitsHL2 = 200, 
    dropoutHL1 = 0.5,
    dropoutHL2 = 0.1, 
    optimizer_learning_rate = 0.0015, 
    optimizer_momentum = 0.0,
    optimizer ='RMSprop',
)
rmsprop_model_dropout._name = 'rmsprop_dropout'

# model dropout with optimizer SGD
sgd_model_dropout = model(
    unitsHL1 = 750, 
    unitsHL2 = 400, 
    dropoutHL1 = 0.1,
    dropoutHL2 = 0.2, 
    optimizer_learning_rate = 0.0020, 
    optimizer_momentum = 0.0,
    optimizer ='SGD',
)
sgd_model_dropout._name = 'sgd_dropout'


# models list
models_list = [adam_model_dropout,rmsprop_model_dropout,sgd_model_dropout]
# epochs list in order: [adam,rmsprop,sgd]
models_epochs_list = [19,18,13]
# batch size list in order: [adam,rmsprop,sgd]
models_batch_size_list = [300,400,375]

In [13]:

models_dropout_results = model_train_results(
    models = models_list,
    epochs = models_epochs_list,
    batch_size = models_batch_size_list,
    X = X_train,
    Y = y_train,
    X_val = X_test,
    Y_val = y_test,
    callbacks=None,
    verbose = 1
) 

In [14]:

plot_train_results(models_dropout_results)

In [17]:

for model in models_list:
    print("model:",model.name)
    print(f"model {model.name} validation loss: {models_dropout_results[model.name]['val_loss']}")
    print(f"model {model.name} validation accuracy: {models_dropout_results[model.name]['val_accuracy']}")
    print(f"model {model.name} stop epoch: {models_dropout_results[model.name]['stop epochs']}")
    print('\n')

model: adam_dropout
model adam_dropout validation loss: 0.06463388353586197
model adam_dropout validation accuracy: 0.98089998960495
model adam_dropout stop epoch: 19


model: rmsprop_dropout
model rmsprop_dropout validation loss: 0.07919890433549881
model rmsprop_dropout validation accuracy: 0.9801999926567078
model rmsprop_dropout stop epoch: 18


model: sgd_dropout
model sgd_dropout validation loss: 0.42107391357421875
model sgd_dropout validation accuracy: 0.8808000087738037
model sgd_dropout stop epoch: 13

Dropout regularization helps the models to avoid overfitting and reach a convergence point, the models perform well on the training and the validation data

6. Training¶

In [16]:

# checkpointer file
checkpointer_adam = ModelCheckpoint(
    filepath='mnist_adam.best.hdf5', 
    save_best_only=True,
)

checkpointer_rmsprop = ModelCheckpoint(
    filepath='mnist_rmsprop.best.hdf5', 
    save_best_only=True,
)

checkpointer_sgd = ModelCheckpoint(
    filepath='mnist_sgd.best.hdf5', 
    save_best_only=True,
)

In [19]:

# model with adam optimizer
hist_adam = adam_model_dropout.fit(X_train, y_train, batch_size=300, epochs=19,
          validation_split=0.2, callbacks=[checkpointer_adam],
          verbose=1, shuffle=True)

In [20]:

# model with rmsprop optimizer
hist_rmsprop = rmsprop_model_dropout.fit(X_train, y_train, batch_size=400, epochs=18,
          validation_split=0.2, callbacks=[checkpointer_rmsprop],
          verbose=1, shuffle=True)

In [21]:

# model with sgd optimizer
hist_sgd = sgd_model_dropout.fit(X_train, y_train, batch_size=375, epochs=13,
          validation_split=0.2, callbacks=[checkpointer_sgd],
          verbose=1, shuffle=True)

7. Predictions¶

Predictions on test set¶

In [13]:

# load the weights that yielded the best validation accuracy
adam_model_dropout.load_weights('mnist_adam.best.hdf5')
rmsprop_model_dropout.load_weights('mnist_rmsprop.best.hdf5')
sgd_model_dropout.load_weights('mnist_sgd.best.hdf5')

In [14]:

# evaluate test accuracy
adam_score = adam_model_dropout.evaluate(X_test,y_test,verbose=0)
rmsprop_score = rmsprop_model_dropout.evaluate(X_test,y_test,verbose=0)
sgd_score= sgd_model_dropout.evaluate(X_test,y_test,verbose=0)

In [15]:

adam_test_accuracy = 100*adam_score[1]
adam_test_loss = adam_score[0]

rmsprop_test_accuracy = 100*rmsprop_score[1]
rmsprop_test_loss = rmsprop_score[0]

sgd_test_accuracy = 100*sgd_score[1]
sgd_test_loss = sgd_score[0]

In [16]:

print("adam model test accuracy:",adam_test_accuracy)
print("rmsprop model test accuracy:",rmsprop_test_accuracy)
print("sgd model test accuracy:",sgd_test_accuracy)

adam model test accuracy: 97.96000123023987
rmsprop model test accuracy: 98.36999773979187
sgd model test accuracy: 89.56000208854675

prediction on custom data¶

Images saved in new_digits folder have a length of 20 instances(RGB images of 32x32 pixels) and are labeled with the corresponding digit.

In [21]:

from DigitsPreprocessing import custom_dataset
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# convert custom images into a set for the MLP model
folder_path = 'new_digits'
custom_set, custom_labels = custom_dataset(folder_path)
# preprocess custom set and labels
custom_set = np.reshape(custom_set,(len(custom_set),28,28))
custom_set = custom_set/255
categorical_labels = to_categorical(custom_labels,10)

In [32]:

# open a raw image
import cv2
raw_image_path = 'new_digits/0.png'
img = cv2.imread(raw_image_path)
imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)

In [34]:

# plot raw and preprocessed image
fig = plt.figure(figsize=(15,15))
ax = fig.add_subplot(1,2,1) 
bx = fig.add_subplot(1,2,2) 

ax.imshow(imgRGB)
ax.set_title("raw image")

bx.imshow(custom_set[0],cmap='gray')
bx.set_title("preprocesed image")

plt.show()

In [23]:

# evaluate accuracy of the model with the custom set
rmsprop_custom_score = rmsprop_model_dropout.evaluate(custom_set,categorical_labels,verbose=0)
print("accuracy prediction on custom set:",100*rmsprop_custom_score[1])

accuracy prediction on custom set: 80.0000011920929

In [36]:

# predictions as single digits
predicted_labels=[]
predictions = rmsprop_model_dropout.predict(custom_set)
for prediction in predictions:
    # from categorical to single digit
    predicted_labels.append(list(prediction).index(max(list(prediction))))

1/1 [==============================] - 0s 90ms/step

In [37]:

print("original digits:",custom_labels)
print("predicted digits:",predicted_labels)

original digits: [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9]
predicted digits: [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 2, 5, 8, 5, 9]

We can see that the MLP model with the RMSprop optimizer is the model with the highest accuracy, around 98.37%. The model performance decreases accuracy when we try to make predictions with other images that don't belong to the original dataset. The accuracy with a custom dataset was 80% (4 of the 20 images were misclassified).

Code Version 1

codeBox.js

Digit recognition with Multi-layer perceptron (MLP) models

1. Dataset¶

References:¶

2. Loading dataset¶

3. Visualize dataset¶

4. Data preprocessing¶

Normalize images:¶

Encode the labels:¶

5. Model Architecture¶

Requeriments:¶

Reproducibility:¶

Design:¶

Grid search tuning¶

Grid search : Layers and Adam optimizer¶

Grid search : Layers and RMSprop optimizer¶

Grid search : Layers and SGD optimizer¶

Grid search results : Layers and optimizer¶

Model Convergence:¶

Model Convergence: Adam¶

Model Convergence: RMSprop¶

Model Convergence: SGD¶

Models convergence results¶

Dropout Regularization:¶

model dropout with 'adam' optimizer¶

model dropout with 'RMSprop' optimizer¶

model dropout with 'SGD' optimizer¶

models regularization dropout results¶

6. Training¶

7. Predictions¶

Predictions on test set¶

prediction on custom data¶

Comments

Post a Comment

Popular Post

ESP32 SPI Master - Slave

ESP32-CAM: Stream Video with an Asynchronous Web Server

Prototype of a simple mobile robot

Analizing IMDB data (movie review) for sentiment analysis