We covered the introduction to deep neural netwoorksg in the last blog post. Finally, we will learn about the Convolutional Neural Networks.
Introduction to Convolution
The first step is to understand what Convolution is. This is a key operation to process images.
Convolution is basically taking a small square with some values (kernel) and moving it around in the image while multiplying these values together and putting them in some other square (feature map). It looks like this:
Let’s try to visualize this a bit. Have you ever seen one of these?
This is called magnetic field viewing paper. What it does is showing the magnets underneath when you put it on a surface, like this:
We can think of the iPad as the image, the magnetic paper as the kernel, and how the iPad looks underneath the magnetic paper can be the feature map. This is an excellent example of how each convolutional kernel can detect different features in an image.
Before we concolude the convolution section, there is one more thing we need to talk about. Remember the convolution operation figure? Do you notice something here?
You can see that the size of the feature map is smaller than the image. Why? Because we take nine values and produce one value only. This is causing our image size to get smaller. How to fix this? We use padding.
Padding is just adding values around the image and making it arbitrarily larger. Thus, when the convolution is applied, the image size will stay the same.
Introduction to Convolutional Neural Networks
Convolutional neural networks (CNNs) are probably one of the most famous DNN structures used nowadays. They are the perfect tools to process images, thanks to the convolution operations.
If you recall the iPad example, there was a piece of paper that could detect the magnets underneath, and we made the analogy of it as the kernel. If we had another piece of paper that could detect the plastics underneath, then our feature map would have been the detected plastics in the iPad (image).
Imagine we have lots of different papers, and each of them can detect a different material. Now imagine there are multiple things underneath a black curtain, and we want to find out which one is the iPad. We could take magnet detection paper and move it on this curtain. Once we see a similar magnetic structure, we can say we found the iPad.
This is similar to how convolutional neural networks work. Multiple convolution operations are applied to the input image in each layer and each kernel learns a different feature (e.g., edges, corner, etc.). Once these layers are stacked one after the other, each layer can learn multiple features, and then they can decide what the input is together.
A bit of history
Do you remember the MNIST dataset? The one with many handwritten digits. That is important for CNNs because that was the task that made CNNs famous.
The LeNet was the first example to show the CNNs outperform all the other models. This happened in 1998.
Then, the AlexNet was proposed in 2012, and it achieved 10% higher accuracy in image classification tasks than all other methods. The secret here was the depth of AlexNet; it was much deeper than the models before. This was computationally too expensive before, BUT they made use of the GPUs, and it became doable. This was the starting point of the Deep Learning BOOM in the last decade.
With the introduction of AlexNet, deep neural networks attracted serious attention, and the number of studies exploded. People were looking for a way to propose deeper and better neural networks. Thanks to the improvement in GPUs, computationally impossible things became possible, and deep learning was applied almost everywhere.
And this is it! I hope you enjoyed the very brief introduction to deep neural networks, and I hope this series managed to get you excited about it! Thanks for reading, and feel free to contact me if you need directions for further readings about this topic!