Skip to main content

Posts

codeBox.js

Digit recognition with Multi-layer perceptron (MLP) models

The  MNIST   ( Modified National Institute of Standards and Technology) database  is a large collection of handwritten digits as monochrome images. The digits have been size-normalized and centered in a fixed-size image. The goal of this experiment is to find a set of hyperparameters that result in an accurate model and excellent model performance using GridSearchCV from Scikit-learn as a tunning technique. Randomly I selected 3 optimizers (Adam, RMSprop, and SGD) as a starting point to develop the MLP models; later I tuned the 3 different models with their own hyperparameters and chose the model that predicted with higher accuracy the digits. I created my own database with 20 samples and I preproced the images trying to simulate the original set to predict digits with the selected model. In the end, the model showed high accuracy on the test set but the accuracy decreased with the custom dataset. The problem with MNIST is that the dataset is "too perfect";  in rea...

Analizing IMDB data (movie review) for sentiment analysis

This is the first neural networks project I made with preprocessed data from IMDB to identify sentiments (positive or negative) from a dataset of 50.000 (25.000 for training and 25.000 for testing). I tried to detail every step and decision I made while creating the model. In the end, the neural network model was able to classify with an accuracy of 81.1% or misclassify 11.9% of the data (around 3000 movie reviews). This is a high error margin considering that an acceptable error must be between 3% and 5%, but the model, in general, helped me and gave me clues to develop a new version. At the same time, I learned a slight introduction to Natural Language Processing, a topic new to me. 1. The dataset : IMDB (Internet Movie Database) ¶ References: ¶ Maas, A., Daly, R., Pham, P., Huang, D., Ng, A., & Potts, C. (2011). Learning Word Vectors for Sentiment Analysis. IMDB movie review sentiment classification dataset ...