The MNIST (Modified National Institute of Standards and Technology) database is a large collection of handwritten digits as monochrome images. The digits have been size-normalized and centered in a fixed-size image.
The goal of this experiment is to find a set of hyperparameters that result in an accurate model and excellent model performance using GridSearchCV from Scikit-learn as a tunning technique.
Randomly I selected 3 optimizers (Adam, RMSprop, and SGD) as a starting point to develop the MLP models; later I tuned the 3 different models with their own hyperparameters and chose the model that predicted with higher accuracy the digits.
I created my own database with 20 samples and I preproced the images trying to simulate the original set to predict digits with the selected model.
In the end, the model showed high accuracy on the test set but the accuracy decreased with the custom dataset.
The problem with MNIST is that the dataset is "too perfect"; in real life, we have to deal with lights and shadows in images, variations in the way people draw a digit, noise (shapes that are not part of the actual digit), no-centered draws, and more consideration. A digit recognition model that only works on its own dataset wouldn't be that interesting but is good enough if you are new in this field and if your goal is to learn, practice, and get familiar with different machine learning tools.
The complete Jupyter Notebook can be found in my github: here
1. Dataset¶
MNIST
is a dataset of 60,000 grayscale images(28x28 pixels) of the 10 digits (0−9), along with a test set of 10,000 images
The digits have been size-normalized and centered in a fixed-size image of 28x28 pixels.
Every pixel in an image will be treated as the input of the Multilayer perceptron (MLP) model and the output will be the one-hot encoding array of the 10-digit
References:¶
Yefeng Xia. (August 19,2020). From MNIST to the real-world, why the trained CNN model not works?
Engati Simply Intelligence. (January ,2021). MNIST Dataset
Keras.io.MNIST digits classification dataset
Mostafa Ibrahim. (March 13, 2024).A Deep Dive Into Learning Curves in Machine Learning
Jason Brownlee. (August 6, 2019). How to use Learning Curves to Diagnose Machine Learning Model Performance
2. Loading dataset¶
from keras.datasets import mnist
from sklearn.model_selection import train_test_split
# use Keras to import pre-shuffled MNIST database
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# extract validation dataset
X_train, X_validation, y_train, y_validation = train_test_split(
X_train,
y_train,
test_size=0.2,
random_state=42)
print("The MNIST database has a training set of %d examples." % len(X_train))
print("The MNIST database has a validation set of %d examples." % len(X_validation))
print("The MNIST database has a test set of %d examples." % len(X_test))
The MNIST database has a training set of 48000 examples. The MNIST database has a validation set of 12000 examples. The MNIST database has a test set of 10000 examples.
print("shape of the images:",X_train[0].shape)
print("label of the first image",y_train[0])
shape of the images: (28, 28) label of the first image 5
3. Visualize dataset¶
import matplotlib.pyplot as plt
# plot the first 3 images:
fig = plt.figure(figsize=(15,15))
for i in range(3):
ax = fig.add_subplot(1,3,i+1)
ax.imshow(X_train[i], cmap='gray')
ax.set_title("label: "+str(y_train[i]))
import numpy as np
section = X_train[0][10:20,10:20]
print("image section:\n",section,"\n")
print("pixel maximum value:", np.max(X_train[0]))
print("pixel minimum value:", np.min(X_train[0]))
image section: [[ 21 0 0 0 0 0 0 0 0 0] [223 223 193 71 6 0 0 0 0 0] [253 253 253 253 196 121 0 0 0 0] [144 144 217 251 253 253 170 4 0 0] [ 0 0 0 53 236 253 253 215 3 0] [ 0 0 0 0 34 180 253 253 128 0] [ 0 0 0 0 0 2 140 253 236 36] [ 0 0 0 0 0 0 13 215 253 62] [ 0 0 0 0 0 0 0 105 253 62] [ 0 0 0 0 0 0 0 99 253 62]] pixel maximum value: 255 pixel minimum value: 0
4. Data preprocessing¶
Normalize images:¶
"When using the image as it is and passing through a Deep Neural Network, the computation of high numeric values may become more complex. To reduce this we can normalize the values to range from 0 to 1. In this way, the numbers will be small and the computation becomes easier and faster." Asha Ponraj.(Feb19,2021), A Tip A Day — Python Tip #8
X_train = X_train/255
X_validation = X_validation/255
X_test = X_test/255
Encode the labels:¶
The dataset labels are categorical variables (digits from 0 through 9). We need to encode these values before feeding them to a neural network. Since there are very few categories we can use one-hot
encoding.
To represent a given label, one-hot
encoding creates a vector of length equal to the total number of categories (in this case 10), Then, in order to represent a given label, the corresponding element of the encoding vector is set to 1, and all other elements to 0 (for example [0 0 0 0 0 1 0 0 0 0] is iqual to 5)
from keras.utils.np_utils import to_categorical
print ("integer representation of first 5 labels:\n",y_train[0:5])
y_train = to_categorical(y_train,10)
y_validation = to_categorical(y_validation,10)
y_test = to_categorical(y_test,10)
print("one-hot representation of first 5 labels:\n",y_train[0:5])
integer representation of first 5 labels: [5 0 1 6 1] one-hot representation of first 5 labels: [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]
5. Model Architecture¶
Requeriments:¶
scikeras
makes it possible to use keras
with scikit learn
. In this particular case, I'm going to use GridSearchCV
from scikit learn
for model tuning and return the hyperparameters that adjust the most to the neural network model.
if you already have Keras and TensorFlow, install scikeras
with no dependencies:
pip install --no-deps scikeras
Documentation available at: https://adriangb.com/scikeras/stable/index.html
Reproducibility:¶
In a reproducible model, the weights of the model should be initialized with same values in subsequent runs, for experimentation purposes or to debug a problem.
More about reproducibility in keras:
Design:¶
# To use CPU/GPU in training process
import tensorflow as tf
# Data analysis
import pandas as pd
# Neural network
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
# Optimizers
from keras.optimizers import Adam, SGD, RMSprop
# Grid Search
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import GridSearchCV
# Prevent overfitting in training process for epochs > 2
from keras.callbacks import EarlyStopping
# Save the best weights in a checkpoint file
from keras.callbacks import ModelCheckpoint
# reproducibility
# If using TensorFlow, this will make GPU ops as deterministic as possible,
# but it will affect the overall performance, so be mindful of that.
tf.config.experimental.enable_op_determinism()
# to control randomness
keras.utils.set_random_seed(42)
# control randomness, used with RELU activation function
he = keras.initializers.he_normal(seed=42)
# control randomness, used with softmax activation function
glorot = keras.initializers.glorot_normal(seed=42)
# control randomness, used with bias
zeros = keras.initializers.zeros()
ones = keras.initializers.ones()
Every pixel in an image will be treated as the input of the Multilayer perceptron (MLP) model; the output or the prediction is encoded with one-hot algorithm and return a single array of lenght 10, where the position with the highest value represent the predicted digit.
def model(unitsHL1, unitsHL2,
dropoutHL1, dropoutHL2,
optimizer_learning_rate,
optimizer_momentum,
optimizer='Adam'):
model = Sequential()
# input layer
# flattern convert image into an array
model.add(Flatten(input_shape=X_train.shape[1:]))
# hidden layer 1
model.add(Dense(unitsHL1,
activation='relu',
kernel_initializer=he,
bias_initializer=ones,
))
# regularization
if dropoutHL1 > 0:
model.add(Dropout(dropoutHL1))
# hidden layer 2
model.add(Dense(unitsHL2,
activation='relu',
kernel_initializer=he,
bias_initializer=ones,
))
# regularization
if dropoutHL2 > 0:
model.add(Dropout(dropoutHL2))
# output layer
model.add(Dense(10,
activation='softmax',
kernel_initializer=glorot,
bias_initializer=ones,
))
# optimizer
if optimizer == 'Adam':
opt = Adam(
learning_rate=optimizer_learning_rate
)
elif optimizer == 'SGD':
opt = SGD(
learning_rate=optimizer_learning_rate,
momentum=optimizer_momentum,
)
elif optimizer == 'RMSprop':
opt = RMSprop(
learning_rate=optimizer_learning_rate,
momentum=optimizer_momentum,
)
model.compile(
loss='categorical_crossentropy',
optimizer = opt,
metrics=['accuracy'],
)
return model
def model_train_results(models,epochs,batch_size,X,Y,X_val,Y_val,callbacks,verbose=0):
'''
Save the history,epochs,loss and accuracy of a trained model into a dictionary
Input:
- models(array):
- epochs(array): array integer or single integer
- batch_size(array): array integer or single integer
- earlystopping(None): callback,
- X,Y,X_val,Y_val(array): train and validation set
- verbose (int) : 0,1 or 2 show training process details.
Output:
- results(dic):
'''
results = {}
for index,model in enumerate(models):
results[model.name] = {}
# check for single epoch or epoch list
if isinstance(epochs,(int,float)):
n_epochs = epochs
elif isinstance(epochs,list):
n_epochs = epochs[index]
elif len(epochs) != len(models):
print(f"epoch missing value, {len(epochs)} values was given but need {len(models)}")
break
else:
print("epoch value error")
break
# check for single batch size or batch size list
if isinstance(batch_size,(int,float)):
n_batch_size = batch_size
elif isinstance(batch_size,list):
n_batch_size = batch_size[index]
elif len(batch_size) != len(models):
print(f"batch size missing value,{len(batch_size)} values was given but need {len(models)}")
break
else:
print("batch size value error")
break
# model fit with train and validation set
results[model.name]['hist'] = model.fit(
x = X,
y = Y,
batch_size = n_batch_size,
epochs = n_epochs,
verbose = verbose,
callbacks=callbacks,
validation_data=(X_val,Y_val),
shuffle=True,
)
history = results[model.name]['hist']
# stopped epochs
if (callbacks and earlystopping.stopped_epoch > 0 ):
stopepoch = earlystopping.stopped_epoch
results[model.name]['stop epochs'] = stopepoch
# validation loss and accuracy at stopped epoch
results[model.name]['val_loss'] = history.history['val_loss'][stopepoch]
results[model.name]['val_accuracy'] = history.history['val_accuracy'][stopepoch]
else:
results[model.name]['stop epochs'] = n_epochs
# model calculate with all the epochs
results[model.name]['val_loss'] = history.history['val_loss'][n_epochs-1]
results[model.name]['val_accuracy'] = history.history['val_accuracy'][n_epochs-1]
return results
def plot_train_results(results):
'''
input:
- results(dic) : model training results saved in a dictionary
{'model':{'key1':'result1','key2':'result2'...}}
output
- plot: loss(left plot), accuracy(right plot)
'''
n_plots = len(list(results.keys()))
if n_plots == 1:
fig,axes = plt.subplots(n_plots,2,figsize=(13,5))
# Convert to 2D array
axes = axes.reshape(1, -1)
else:
fig,axes = plt.subplots(n_plots,2,figsize=(13,13))
for index,model in enumerate(list(results.keys())):
history = results[model]['hist']
# loss
axes[index,0].plot(history.history['val_loss'])
axes[index,0].plot(history.history['loss'])
axes[index,0].set_title(model + ' loss')
axes[index,0].set_xlabel('Epoch')
axes[index,0].set_ylabel('Loss')
axes[index,0].legend(['val_loss','loss'], loc='upper left')
# accuracy
axes[index,1].plot(history.history['val_accuracy'])
axes[index,1].plot(history.history['accuracy'])
axes[index,1].set_title(model + ' accuracy')
axes[index,1].set_xlabel('Epoch')
axes[index,1].set_ylabel('Accuracy')
axes[index,1].legend(['val_accuracy','accuracy'], loc='upper left')
plt.tight_layout()
plt.show()
Grid search tuning¶
The code below was executed several times until found parameters that returned a good performance
Grid search : Layers and Adam optimizer¶
# patience: number of epochs with no improvement before stopping
# min: training will stop when the quantity monitored has stopped decreasing
# monitor: "val_loss" measure loss on validation set
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 4,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
adam_estimator = KerasClassifier(
model,
unitsHL1=550,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.001,
optimizer_momentum = 0,
optimizer='Adam',
callbacks= earlystopping,
epochs=20,
)
adam_param_grid = {
'unitsHL1':[550,750],
'unitsHL2':[200,400],
'optimizer_learning_rate':[0.001,0.01],
}
adam_grid = GridSearchCV(estimator=adam_estimator, param_grid=adam_param_grid)
with tf.device('GPU:0'):
adam_grid_result = adam_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (adam_grid_result.best_score_, adam_grid_result.best_params_))
Best score: 0.974958 using {'optimizer_learning_rate': 0.001, 'unitsHL1': 750, 'unitsHL2': 200}
Grid search : Layers and RMSprop optimizer¶
# patience: number of epochs with no improvement before stopping
# min: training will stop when the quantity monitored has stopped decreasing
# monitor: "val_loss" measure loss on validation set
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 5,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
RMSprop_estimator = KerasClassifier(
model,
unitsHL1=550,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.001,
optimizer_momentum = 0.0,
optimizer='RMSprop',
callbacks= earlystopping,
epochs=20,
)
RMSprop_param_grid = {
'unitsHL1':[550,750,850],
'unitsHL2':[200,400],
'optimizer_learning_rate':[0.001,0.01,0.1],
'optimizer_momentum':[0.0,0.001,0.01,0.1],
}
RMSprop_grid = GridSearchCV(
estimator=RMSprop_estimator,
param_grid=RMSprop_param_grid
)
with tf.device('GPU:0'):
RMSprop_grid_result = RMSprop_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (RMSprop_grid_result.best_score_, RMSprop_grid_result.best_params_))
Best score: 0.974958 using {'optimizer_learning_rate': 0.001, 'optimizer_momentum': 0.0, 'unitsHL1': 750, 'unitsHL2': 200}
Grid search : Layers and SGD optimizer¶
# patience: number of epochs with no improvement before stopping
# min: training will stop when the quantity monitored has stopped decreasing
# monitor: "val_loss" measure loss on validation set
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 3,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
SGD_estimator = KerasClassifier(
model,
unitsHL1=550,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.001,
optimizer_momentum = 0.0,
optimizer='SGD',
callbacks= earlystopping,
epochs=20,
)
SGD_param_grid = {
'unitsHL1':[550,750,850],
'unitsHL2':[200,400],
'optimizer_learning_rate':[0.001,0.01,0.1],
'optimizer_momentum':[0.0,0.001,0.01],
}
SGD_grid = GridSearchCV(
estimator=SGD_estimator,
param_grid=SGD_param_grid,
cv = 3,
)
with tf.device('GPU:0'):
SGD_grid_result = SGD_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (SGD_grid_result.best_score_, SGD_grid_result.best_params_))
Best score: 0.974021 using {'optimizer_learning_rate': 0.001, 'optimizer_momentum': 0.0, 'unitsHL1': 750, 'unitsHL2': 400}
Grid search results : Layers and optimizer¶
Every Grid search was set to epochs = 20
with earlystopping callback
to stop training when the val_loss
metric stops improving.
The batch_size = 32
is the default if no value is specified.
# training results
# model : optimizer adam
adam_model = model(
unitsHL1 = 750,
unitsHL2 = 200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate = 0.001,
optimizer_momentum = 0.0,
optimizer ='Adam',
)
adam_model._name = 'adam_lr2'
# model : optimizer RMSprop
rmsprop_model = model(
unitsHL1 = 750,
unitsHL2 = 200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate = 0.001,
optimizer_momentum = 0.0,
optimizer='RMSprop',
)
rmsprop_model._name = 'rmsprop_lr2'
# model: optimizer SGD
sgd_model = model(
unitsHL1 =750,
unitsHL2 =400,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate =0.001,
optimizer_momentum =0.0,
optimizer='SGD',
)
sgd_model._name = 'sgd_lr2'
models = [adam_model,rmsprop_model,sgd_model]
# stop training if no improvement
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 4,
mode = 'min',
restore_best_weights = True,
verbose = 0)
callbacks = [earlystopping]
train_results = model_train_results(
models=models,
epochs=20,
batch_size=32,
X= X_train,
Y= y_train,
X_val = X_validation,
Y_val = y_validation,
callbacks = callbacks,
verbose = 1,
)
plot_train_results(train_results)
for model in models:
print(f"model {model.name} validation loss: {train_results[model.name]['val_loss']}")
print(f"model {model.name} validation accuracy: {train_results[model.name]['val_accuracy']}")
print('\n')
model adam_lr2 validation loss: 0.11157926172018051 model adam_lr2 validation accuracy: 0.9767500162124634 model rmsprop_lr2 validation loss: 0.16329512000083923 model rmsprop_lr2 validation accuracy: 0.9752500057220459 model sgd_lr2 validation loss: 0.2640928328037262 model sgd_lr2 validation accuracy: 0.9260833263397217
The models with optimizer Adam
and RMSprop
show bad performance, models that are good in training but have bad predicting (validation data).
The models show a situation where the training loss (loss
) curves are low and the validation loss (val_loss
) curves are erratic and increase, which indicates overfitting.
The model with optimizer SGD
in general has a good performance in training and validation.
A way to optimize the model performance with overfitting is:
- Reduce the training time with a higher
batch size
- Use
regularization
asdropout
orL1
orL2
techniques - Use
earlystopping
to stop the training at a certainepoch
number
Model Convergence:¶
The accuracy
is not enough to select the right model, the number of epochs
plays a significant role in determining the model convergence and performance.
A way to find an optimal number of epochs
is using early stopping
, this regularization technique stops the training as soon as the validation error reaches a minimum preventing overfitting. Determining an appropriate number of epochs
also helps manage computational resources effectively by avoiding unnecessary training iterations.
The batch size
hyperparameter also has a significant impact on the model performance and training time. In practice models with a high batch size
are not generalized as well as models with a low batch size
.
Convergence
tells us that the model has understood the patterns in the data and is making accurate predictions.
During the training of a machine learning model, the current state of the model at each step of the training algorithm can be evaluated. It can be evaluated on the training dataset to give an idea of how well the model is “learning.” It can also be evaluated on a hold-out validation dataset that is not part of the training dataset. Evaluation on the validation dataset gives an idea of how well the model is “generalizing.” Jason Browlee(Aug 6,2019). How to use Learning Curves to Diagnose Machine Learning Model Performance
NOTE
The optimal learning rate depends on the other hyperparameters— especially the batch size—so if you modify any hyperparameter, make sure to update the learning rate as well. Aurélien Géron - Hands-on Machine Learning with scikit-learn,keras & tensorflow
Model Convergence: Adam¶
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 4,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
adam_estimator = KerasClassifier(
model,
unitsHL1=750,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.001,
optimizer_momentum = 0.0,
optimizer='Adam',
callbacks=[earlystopping],
epochs=24,
)
adam_param_grid = {
'optimizer_learning_rate':[0.001,0.0015,0.002],
'batch_size':[275,300,325],
}
adam_grid = GridSearchCV(
estimator=adam_estimator,
param_grid=adam_param_grid,
cv=4, # cross-validation default 5-fold
)
with tf.device('GPU:0'):
adam_grid_result = adam_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (adam_grid_result.best_score_, adam_grid_result.best_params_))
Best score: 0.973729 using {'batch_size': 300, 'optimizer_learning_rate': 0.0015}
print("best estimator stopped epoch:",adam_grid_result.best_estimator_.current_epoch)
best estimator stopped epoch: 13
Model Convergence: RMSprop¶
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 5,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
rmsprop_estimator = KerasClassifier(
model,
unitsHL1=750,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.001,
optimizer_momentum = 0.0,
optimizer='RMSprop',
callbacks=[earlystopping],
epochs=25,
)
rmsprop_param_grid = {
'optimizer_learning_rate':[0.001,0.0015,0.002],
'optimizer_momentum':[0.0,0.001],
'batch_size':[350,400,450]
}
rmsprop_grid = GridSearchCV(
estimator=rmsprop_estimator,
param_grid=rmsprop_param_grid,
cv=4, # cross-validation default 5-fold
)
with tf.device('GPU:0'):
rmsprop_grid_result = rmsprop_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (rmsprop_grid_result.best_score_, rmsprop_grid_result.best_params_))
Best score: 0.974313 using {'batch_size': 400, 'optimizer_learning_rate': 0.0015, 'optimizer_momentum': 0.0}
print("best estimator stopped epoch:",rmsprop_grid_result.best_estimator_.current_epoch)
best estimator stopped epoch: 20
Model Convergence: SGD¶
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 5,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
sgd_estimator = KerasClassifier(
model,
unitsHL1=750,
unitsHL2=400,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.001,
optimizer_momentum = 0.0,
optimizer='SGD',
callbacks=[earlystopping],
epochs=30,
)
sgd_param_grid = {
'optimizer_learning_rate':[0.001,0.0015,0.002],
'batch_size':[100,200,300]
}
sgd_grid = GridSearchCV(
estimator=sgd_estimator,
param_grid=sgd_param_grid,
cv=4, # cross-validation default 5-fold
)
with tf.device('GPU:0'):
sgd_grid_result = sgd_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (sgd_grid_result.best_score_, sgd_grid_result.best_params_))
Best score: 0.974250 using {'batch_size': 200, 'optimizer_learning_rate': 0.0015}
print("best estimator stopped epoch:",sgd_grid_result.best_estimator_.current_epoch)
best estimator stopped epoch: 11
Models convergence results¶
# training convergence results
# model convergence with optimizer adam
adam_model_convg = model(
unitsHL1 = 750,
unitsHL2 = 200,
dropoutHL1 = 0.0,
dropoutHL2 = 0.0,
optimizer_learning_rate = 0.0015,
optimizer_momentum = 0.0,
optimizer ='Adam',
)
adam_model_convg._name = 'adam_convg'
# model convergence with optimizer RMSprop
rmsprop_model_convg = model(
unitsHL1 = 750,
unitsHL2 = 200,
dropoutHL1 = 0.0,
dropoutHL2 = 0.0,
optimizer_learning_rate = 0.0015,
optimizer_momentum = 0.0,
optimizer ='RMSprop',
)
rmsprop_model_convg._name = 'rmsprop_convg'
# model convergence with optimizer SGD
sgd_model_convg = model(
unitsHL1 = 750,
unitsHL2 = 400,
dropoutHL1 = 0.0,
dropoutHL2 = 0.0,
optimizer_learning_rate = 0.002,
optimizer_momentum = 0.0,
optimizer ='SGD',
)
sgd_model_convg._name = 'sgd_convg'
# models list
models_list = [adam_model_convg,rmsprop_model_convg,sgd_model_convg]
# epochs list in order: [adam,rmsprop,sgd]
models_epochs_list = [13,20,14]
# batch size list in order: [adam,rmsprop,sgd]
models_batch_size_list = [300,400,375]
models_convg_results = model_train_results(
models = models_list,
epochs = models_epochs_list,
batch_size = models_batch_size_list,
X = X_train,
Y = y_train,
X_val = X_validation,
Y_val = y_validation,
callbacks=None,
verbose = 1
)
plot_train_results(models_convg_results)
for model in models_list:
print(f"model {model.name} validation loss: {models_convg_results[model.name]['val_loss']}")
print(f"model {model.name} validation accuracy: {models_convg_results[model.name]['val_accuracy']}")
print('\n')
model adam_convg validation loss: 0.08512312173843384 model adam_convg validation accuracy: 0.9765833616256714 model rmsprop_convg validation loss: 0.12291974574327469 model rmsprop_convg validation accuracy: 0.9830833077430725 model sgd_convg validation loss: 0.425358384847641 model sgd_convg validation accuracy: 0.8804166913032532
The models improved performance after finding a suitable batch size
number and using early stop
to monitor validation loss
when the metric stopped to improve (to avoid overfitting), which resulted in smoother convergence
and less variability(less erratic curves).
Also, I had to slightly tune the learning rate
of every model due to the other hyperparameter modifications.
Models with Adam
and RMSprop
optimizers continue having signs of overfitting
due to the validation loss
value being higher than the training loss
.
The model with SGD
optimizer converges at a certain point and has no signs of overfitting
but has the low accuracy of the models.
Dropout Regularization:¶
model dropout with 'adam' optimizer¶
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 4,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
adam_estimator = KerasClassifier(
model,
unitsHL1=750,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.0015,
optimizer_momentum = 0.0,
optimizer='Adam',
callbacks=[earlystopping],
epochs=30,
batch_size=300,
)
adam_param_grid = {
#'optimizer_learning_rate':[0.001,0.0015,0.002],
#'batch_size':[200,300,400],
'dropoutHL1':[0.6],
'dropoutHL2':[0.3,0.4],
}
adam_grid = GridSearchCV(
estimator=adam_estimator,
param_grid=adam_param_grid,
cv=4, # cross-validation default 5-fold
)
with tf.device('GPU:0'):
adam_grid_result = adam_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (adam_grid_result.best_score_, adam_grid_result.best_params_))
Best score: 0.978646 using {'dropoutHL1': 0.6, 'dropoutHL2': 0.3}
print("best estimator stopped epoch:",adam_grid_result.best_estimator_.current_epoch)
best estimator stopped epoch: 19
model dropout with 'RMSprop' optimizer¶
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 4,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
rmsprop_estimator = KerasClassifier(
model,
unitsHL1=750,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.0015,
optimizer_momentum = 0.0,
optimizer='RMSprop',
callbacks=[earlystopping],
epochs=25,
batch_size=400,
)
rmsprop_param_grid = {
#'optimizer_learning_rate':[0.001,0.0015,0.002],
#'batch_size':[300,400,500],
'dropoutHL1':[0.5,0.6],
'dropoutHL2':[0.0,0.1,0.2],
}
rmsprop_grid = GridSearchCV(
estimator=rmsprop_estimator,
param_grid=rmsprop_param_grid,
cv=4, # cross-validation default 5-fold
)
with tf.device('GPU:0'):
rmsprop_grid_result = rmsprop_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (rmsprop_grid_result.best_score_, rmsprop_grid_result.best_params_))
Best score: 0.977354 using {'dropoutHL1': 0.5, 'dropoutHL2': 0.1}
print("best estimator stopped epoch:",rmsprop_grid_result.best_estimator_.current_epoch)
best estimator stopped epoch: 18
model dropout with 'SGD' optimizer¶
earlystopping = EarlyStopping(monitor = "val_loss",
patience = 4,
mode = 'auto',
restore_best_weights = True,
verbose = 1)
sgd_estimator = KerasClassifier(
model,
unitsHL1=750,
unitsHL2=200,
dropoutHL1 = 0,
dropoutHL2 = 0,
optimizer_learning_rate=0.002,
optimizer_momentum = 0.0,
optimizer='SGD',
callbacks=[earlystopping],
epochs=25,
batch_size=375,
)
sgd_param_grid = {
#'optimizer_learning_rate':[0.001,0.0015,0.002],
#'batch_size':[200,300,400],
'dropoutHL1':[0.0,0.1,0,0.2],
'dropoutHL2':[0.0,0.1,0.2],
}
sgd_grid = GridSearchCV(
estimator=sgd_estimator,
param_grid=sgd_param_grid,
cv=4, # cross-validation default 5-fold
)
with tf.device('GPU:0'):
sgd_grid_result = sgd_grid.fit(
X= X_train,
y= y_train,
validation_data=(X_validation,y_validation),
verbose=1,
)
print("Best score: %f using %s" % (sgd_grid_result.best_score_, sgd_grid_result.best_params_))
Best score: 0.975833 using {'dropoutHL1': 0.1, 'dropoutHL2': 0.2}
print("best estimator stopped epoch:",sgd_grid_result.best_estimator_.current_epoch)
best estimator stopped epoch: 13
models regularization dropout results¶
# training regularization results
# model dropout with optimizer adam
adam_model_dropout = model(
unitsHL1 = 750,
unitsHL2 = 200,
dropoutHL1 = 0.6,
dropoutHL2 = 0.3,
optimizer_learning_rate = 0.0015,
optimizer_momentum = 0.0,
optimizer ='Adam',
)
adam_model_dropout._name = 'adam_dropout'
# model dropout with optimizer RMSprop
rmsprop_model_dropout = model(
unitsHL1 = 750,
unitsHL2 = 200,
dropoutHL1 = 0.5,
dropoutHL2 = 0.1,
optimizer_learning_rate = 0.0015,
optimizer_momentum = 0.0,
optimizer ='RMSprop',
)
rmsprop_model_dropout._name = 'rmsprop_dropout'
# model dropout with optimizer SGD
sgd_model_dropout = model(
unitsHL1 = 750,
unitsHL2 = 400,
dropoutHL1 = 0.1,
dropoutHL2 = 0.2,
optimizer_learning_rate = 0.0020,
optimizer_momentum = 0.0,
optimizer ='SGD',
)
sgd_model_dropout._name = 'sgd_dropout'
# models list
models_list = [adam_model_dropout,rmsprop_model_dropout,sgd_model_dropout]
# epochs list in order: [adam,rmsprop,sgd]
models_epochs_list = [19,18,13]
# batch size list in order: [adam,rmsprop,sgd]
models_batch_size_list = [300,400,375]
models_dropout_results = model_train_results(
models = models_list,
epochs = models_epochs_list,
batch_size = models_batch_size_list,
X = X_train,
Y = y_train,
X_val = X_test,
Y_val = y_test,
callbacks=None,
verbose = 1
)
plot_train_results(models_dropout_results)
for model in models_list:
print("model:",model.name)
print(f"model {model.name} validation loss: {models_dropout_results[model.name]['val_loss']}")
print(f"model {model.name} validation accuracy: {models_dropout_results[model.name]['val_accuracy']}")
print(f"model {model.name} stop epoch: {models_dropout_results[model.name]['stop epochs']}")
print('\n')
model: adam_dropout model adam_dropout validation loss: 0.06463388353586197 model adam_dropout validation accuracy: 0.98089998960495 model adam_dropout stop epoch: 19 model: rmsprop_dropout model rmsprop_dropout validation loss: 0.07919890433549881 model rmsprop_dropout validation accuracy: 0.9801999926567078 model rmsprop_dropout stop epoch: 18 model: sgd_dropout model sgd_dropout validation loss: 0.42107391357421875 model sgd_dropout validation accuracy: 0.8808000087738037 model sgd_dropout stop epoch: 13
Dropout
regularization helps the models to avoid overfitting
and reach a convergence
point, the models perform well on the training and the validation data
6. Training¶
# checkpointer file
checkpointer_adam = ModelCheckpoint(
filepath='mnist_adam.best.hdf5',
save_best_only=True,
)
checkpointer_rmsprop = ModelCheckpoint(
filepath='mnist_rmsprop.best.hdf5',
save_best_only=True,
)
checkpointer_sgd = ModelCheckpoint(
filepath='mnist_sgd.best.hdf5',
save_best_only=True,
)
# model with adam optimizer
hist_adam = adam_model_dropout.fit(X_train, y_train, batch_size=300, epochs=19,
validation_split=0.2, callbacks=[checkpointer_adam],
verbose=1, shuffle=True)
# model with rmsprop optimizer
hist_rmsprop = rmsprop_model_dropout.fit(X_train, y_train, batch_size=400, epochs=18,
validation_split=0.2, callbacks=[checkpointer_rmsprop],
verbose=1, shuffle=True)
# model with sgd optimizer
hist_sgd = sgd_model_dropout.fit(X_train, y_train, batch_size=375, epochs=13,
validation_split=0.2, callbacks=[checkpointer_sgd],
verbose=1, shuffle=True)
7. Predictions¶
Predictions on test set¶
# load the weights that yielded the best validation accuracy
adam_model_dropout.load_weights('mnist_adam.best.hdf5')
rmsprop_model_dropout.load_weights('mnist_rmsprop.best.hdf5')
sgd_model_dropout.load_weights('mnist_sgd.best.hdf5')
# evaluate test accuracy
adam_score = adam_model_dropout.evaluate(X_test,y_test,verbose=0)
rmsprop_score = rmsprop_model_dropout.evaluate(X_test,y_test,verbose=0)
sgd_score= sgd_model_dropout.evaluate(X_test,y_test,verbose=0)
adam_test_accuracy = 100*adam_score[1]
adam_test_loss = adam_score[0]
rmsprop_test_accuracy = 100*rmsprop_score[1]
rmsprop_test_loss = rmsprop_score[0]
sgd_test_accuracy = 100*sgd_score[1]
sgd_test_loss = sgd_score[0]
print("adam model test accuracy:",adam_test_accuracy)
print("rmsprop model test accuracy:",rmsprop_test_accuracy)
print("sgd model test accuracy:",sgd_test_accuracy)
adam model test accuracy: 97.96000123023987 rmsprop model test accuracy: 98.36999773979187 sgd model test accuracy: 89.56000208854675
prediction on custom data¶
Images saved in new_digits
folder have a length of 20 instances(RGB
images of 32x32
pixels) and are labeled with the corresponding digit.
from DigitsPreprocessing import custom_dataset
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
# convert custom images into a set for the MLP model
folder_path = 'new_digits'
custom_set, custom_labels = custom_dataset(folder_path)
# preprocess custom set and labels
custom_set = np.reshape(custom_set,(len(custom_set),28,28))
custom_set = custom_set/255
categorical_labels = to_categorical(custom_labels,10)
# open a raw image
import cv2
raw_image_path = 'new_digits/0.png'
img = cv2.imread(raw_image_path)
imgRGB = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
# plot raw and preprocessed image
fig = plt.figure(figsize=(15,15))
ax = fig.add_subplot(1,2,1)
bx = fig.add_subplot(1,2,2)
ax.imshow(imgRGB)
ax.set_title("raw image")
bx.imshow(custom_set[0],cmap='gray')
bx.set_title("preprocesed image")
plt.show()
# evaluate accuracy of the model with the custom set
rmsprop_custom_score = rmsprop_model_dropout.evaluate(custom_set,categorical_labels,verbose=0)
print("accuracy prediction on custom set:",100*rmsprop_custom_score[1])
accuracy prediction on custom set: 80.0000011920929
# predictions as single digits
predicted_labels=[]
predictions = rmsprop_model_dropout.predict(custom_set)
for prediction in predictions:
# from categorical to single digit
predicted_labels.append(list(prediction).index(max(list(prediction))))
1/1 [==============================] - 0s 90ms/step
print("original digits:",custom_labels)
print("predicted digits:",predicted_labels)
original digits: [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9] predicted digits: [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 2, 5, 8, 5, 9]
We can see that the MLP model with the RMSprop
optimizer is the model with the highest accuracy, around 98.37%. The model performance decreases accuracy when we try to make predictions with other images that don't belong to the original dataset. The accuracy with a custom dataset was 80% (4 of the 20 images were misclassified).
Comments
Post a Comment