In this article, I will cover what a CNN is, why it is needed and high level look at how it works.
What is a Convolutional Neural Network? A Convolutional Neural Network (CNN) is a Neural Network who takes an input of an image and outputs a 1-dimensional array. This process is also known as feature extraction and the goal of this process is to generate features in the form of a 1-dimensional array so that it can be fed into another model such as an Artificial Neural Network.
Question: If a Convolutional Neural Network creates a 1-dimensional array for another model, why don’t we just flatten an image and then feed the flattened image into a said model? Here, flattening would be the process of taking a matrix and turning it into a 1-dimensional array through the process of appending each row to the previous row. For example, the following are rows of a 3 x 3 matrix. Row 1 = [a,b,c], Row 2 = [d,e,f] and Row 3 = [g,h,i]. If we flatten this matrix, we would get a 1-dimensional array being [a,b,c,d,e,f,g,h,i]. So why don’t we do this with an image and then feed it into a machine learning model? A flattened image loses correlation between different pixels. For example, if we had an array with a length of 16, it would be ambiguous as to the original shape of the matrix it came from. Was it a 4 x 4 matrix? How about a 2 x 8 or an 8 x 2 matrix? Our model wouldn’t know. This is why Convolutional Neural Networks exist — to generate features based upon an image (or other similarly structered data)…