Post 3: How Neural Nets Learn

The past few weeks have been intense. I’ve spent so many hours reading, watching, and doing whatever I can to learn more about neural nets. In my first post, I explained the basics of machine learning, and in my second post, I explained what a neural net actually looks like. Now it’s time to delve into what actually allows a neural net learn from its data.

The first step is to randomly initialize the weights that control the connections between the neurons. These weights must be different from one another so that algorithms later on can recognize the difference between the connections – if the weights were all identical, the algorithms would be unable to determine which weight needs to be adjusted to increase the network’s prediction or classification accuracy.

The next step is to implement forward propagation. Forward propagation is simply the computation of the functions in the neurons using the outputs from the previous layers. This continues until the net gives an output. You might be thinking that this can’t be accurate – after all, the weights are all random. You’d be right, too – this first stage really just exists to give the neural net a starting place.

The third step is to compute the cost function. The cost function measures how far the result given by the neural net is from the true answer. Remember, when training a machine learning algorithm, we have both the predictors and the dependent variables. By using a cost function, we’ll be able to evaluate how well the neural net is performing.

The next and probably most important step is backpropagation. Backpropagation finds the difference between the inputs and outputs at each neuron. The algorithm then updates the weights that connect the neurons with each other according to a specific equation. I won’t get into the details on that here, but if you’re interested in learning more about these equations, the Wikipedia article on backpropagation is actually a great place to see them in all their glorious notation.

From here, the process repeats, except instead of having randomly initialized weights, the neural net has the weights from the previous run. An algorithm called gradient descent controls how much backpropagation is allowed to change the weights between neurons. At a basic level, gradient descent is an algorithm that iterates until it finds a combination of weights that results in the lowest error. Once the gradient descent algorithm detects that the neural net is as accurate as it can be, the training process stops. After that, test data can be put into the neural net.

In the future, I think I’d like to get my own neural net up and running. In addition, I’d really like to make some videos that explain how neural nets work at a high level – I’ve found that many of the explanations online are targeted toward those with a strong background in math and programming. While it’s necessary to have this background to fully understand the reasons why a neural net works, I don’t think it’s necessary to know the math to get an understanding of why neural nets are so powerful. I think I want to target this videos at the average person – I’d like to touch on each component without introducing intimidating notation or code snippets. I think machine learning has a lot of potential for being explained to the average person, but I don’t think anyone is really trying to get the word out. I realize that my posts on this blog are probably not accessible enough to serve in this capacity, and I know it’d be a pretty fair time commitment.

I’d also like to read more about other types of neural nets. One of the most popular and powerful types of neural nets is the convolutional neural net. Convnets are used largely in image and video processing. On the data science competition site Kaggle, convnets are consistently the best-performing methods applied to image recognition. I hope to someday be able to implement a convnet for image recognition – I think it would be very rewarding to apply a cutting-edge technique to a novel problem.

This concludes my blogging, and I’ve actually enjoyed it quite a bit. I hope I was informative and that I didn’t get too bogged down in tough math. In any case, I’m so glad I spent a few weeks researching neural nets.


  1. Hello,

    It’s interesting to see how your initial idea for your project changed into the project that you actually ended up performing. This usually is the case, especially in scientific fields where a specific method for gathering data is found to not be the most reliable and/or the fact that useful data cannot be collected using those methods. It’s nice to see work that involves both theoretical processes of why things work (such as what you have done with your research in neural networks) and how this can be practical in the future. I do agree that especially for math based theories, it is very difficult to show why they work to other people without the proper background without causing confusion. However, I think you did a nice job of keeping the subject matter straightforward.