
Why can our brain recognize different handwritten shapes of the number 3?
How does our brain work?
How can we create a machine that recognizes a handwritten image of size 28×28 pixels and classifies it into a number ranging from 0 to 9?
To answer these questions, we need to understand what a neural network is and how it works. In this article, we focus only on the simplest type of neural network, known as the Multilayer Perceptron.
What are neurons and how are they connected?
A neuron is an entity that holds a number, specifically a value between 0 and 1. In our task, the input image is divided into 28×28 pixels, resulting in 784 neurons. Each pixel value ranges from 0.0 for a black pixel to 1.0 for a white pixel. The value stored in a neuron is called its activation.
All 784 neurons form the first layer of the neural network.
The last layer contains 10 neurons, each representing the probability of a digit from 0 to 9.
Between the input and output layers are hidden layers. These layers contain many neurons that define the logic for recognizing and classifying handwritten digits.
Patterns of activations in one layer produce specific patterns in the next layer, and this process continues until the final output layer.
Consider how we recognize the digit 9. The number 9 has a loop at the top and a line on the right. The digit 8 has two loops, one at the top and one at the bottom.
Each of these sub-components can be represented by individual neurons in hidden layers, which detect whether specific characteristics appear in the image.
However, detecting a loop directly is still difficult. Therefore, the loop is further broken down into smaller components.
This explains why neural networks are divided into multiple layers. Each layer breaks down a high-level abstract problem into simpler, lower-level abstract problems.
Weights are used to calculate predictions for subcomponents of an image. The activations from the previous layer are multiplied by weights to compute the activation of neurons in the next layer. This process is repeated across all neurons.
Since activations must remain in the range between 0 and 1, the sigmoid activation function is used to normalize values.
Bias is used to define a threshold that determines how much activation is meaningful. It allows the model to shift activation values and improves learning flexibility.
The main objective of training is to find optimal values for weights and biases using a large amount of data.
To find the best weights and biases, the model starts with random values. Many predictions will initially be incorrect. The model learns from these incorrect predictions by adjusting weights and biases, a process known as training.
To evaluate performance, the model is tested using ground-truth data and its predictions are compared with actual results.
The difference between predicted and actual values is measured using a cost function. Common cost functions include Mean Squared Error (MSE) and Cross-Entropy Loss.
Mean Squared Error is defined as:
MSE = (1 / 2N) × Σ(ŷᵢ − yᵢ)²
Cross-Entropy Loss is defined as:
L = −(1 / N) × Σ Σ yᵢₙ log(ŷᵢₙ)
Feedforward is the most basic neural network architecture. Information flows in one direction, from the input layer to the output layer, without loops. Each layer processes data and passes it to the next until an output is produced.
Backpropagation is the algorithm used to train feedforward neural networks. It adjusts weights and biases to minimize the difference between predicted outputs and actual values.
Let the cost function be C. The negative gradient of this cost function is −ΔC.
The training process follows these steps:
First, input data is fed forward through the network using the current weights and biases to compute the output.
Second, the output is compared with the ground-truth data to calculate the cost function.
Third, the cost difference is used to compute the negative gradient of each weight and bias.
Fourth, weights and biases are updated by adding the negative gradient.
Finally, the next feedforward step begins.
By repeating this loop, the cost function decreases, meaning the model becomes more accurate.
For a detailed explanation, refer to the following video:
Backpropagation – https://www.youtube.com/watch?v=tIeHLnjs5U8
Neural networks recognize and classify complex patterns such as handwritten digits through layered structures that process input data progressively.
The Multilayer Perceptron model breaks down complex visual patterns into simpler components, enabling the model to handle diverse handwriting styles.
Training and optimization are achieved through backpropagation, which adjusts weights and biases to minimize prediction errors.
By mimicking how the brain analyzes visual information, neural networks provide a powerful approach for automated recognition and classification tasks.
3Blue1Brown – Neural Networks
Backpropagation Explained
Nguyen Quang Huy
Without dreams to chase, life becomes mundane.