Neural Networks: The Foundation of Machine Learning

Ishaana Misra
5 min readNov 18, 2020

Everybody hears about artificial intelligence from reading the latest tech news and watching sci-fi movies. AI is going to revolutionize almost every industry, but not many know exactly how it works. It’s not magic, it’s math! Artificial Neural Networks(ANNs) are the building blocks of AI.

What Does a Neural Network Do?

This is one of the most important questions to answer. I’ll try to explain this in the easiest way possible using a simple example. Let’s say that you have just applied to MIT and you immediately want to know whether you will be accepted or rejected even though MIT hasn’t made a decision yet. You want to predict the outcome with very high accuracy.

Something you will probably do is look at, say, the math and science test scores of accepted and rejected students. If you want to be really accurate, then it would be a good idea to graph the science and math test scores, differentiating the accepted students from the others.

Figure 1

You’d notice that there is a trend, the accepted students tend to have high math and science scores, while the opposite is true for the rejected ones. In fact, you could probably separate these two groups of students using a line indicating that students above the line are likely to be accepted, while students below the line likely won’t (note that I said likely, there can be outliers).

This is essentially what a neural network does (I know it sounds simple right now, but we’ll get into the details later). When training a neural network, you are basically giving it data which it then sorts into categories and then draws the line so that when you give it data that hasn’t been categorized(in this case it would be your math and science test scores because it is unknown whether or not you will be accepted), it will be able to categorize it with high accuracy.

You’re probably thinking, what if I have more than two inputs? In this article, I’m only using examples with two inputs because they are easier to visualize and explain, but in the case of 3 or more inputs instead of a line, it would be a plane or n-dimensional hyperplane.

The Equation

As you’ve most likely learned in your algebra class, all lines have equations, and the lines neural networks use are no exception. It’s time for some math!

There are three main variables we’ll want to familiarize ourselves with:

X = inputs(The inputs you provide to your neural network; for example, the math and science test scores from the example above.)

W = weights(What we multiply the inputs by, there is a weight for every input)

b = bias(there is only one bias)

This is how the variables are put together in order to form our equation:

Figure 2

As you can see in the second example, if you have n number of inputs, you will have n number of weights, but just one bias.

This is the equation of the line separating the data, but to figure out where a new data point lies, we can’t use this exact equation. Instead, we’re only going to use the left half of it. We’ll multiply our inputs by their respective weights, add those up, add our bias, and then we’ll use the result to classify the data. More about how we’re gonna use this later!

Perceptrons

Now that we’ve covered the equation of the line, it's time to get into neural networks.

An artificial neural network is basically a computing system vaguely based on biological neural networks found in animals.

You may or may not have seen images of really complex neural networks, but don’t worry, we’ll get started with the simplest of them all: a perceptron.

FIgure 3

Some vocabulary:

Circles = nodes

Lines = edges

Notice how the image above has the very same variables which I just introduced to you. We multiply each node by its respective edge and add them up, just like how we multiply each input by its respective weight. This is all great, but where’s the bias? In this perception, the bias is not represented and is instead added onto our sum of inputs times their weights. It’s time to answer the question everyone is asking: what does the number produced by our sum actually mean? How does all of this math lead to determining whether or not you’re going to get into MIT?

Probabilities

Even though it would be a lot easier to simply classify something as a yes-or-no sort of a thing, it’s a lot more useful to determine a probability between 0 and 1.

Figure 4(not drawn to scale)

Shown above is the probability space for the line which we drew earlier. In the place where the line is, there is a probability of 0.5 because if your point falls on the line, there is a 50–50 chance that your data falls above the line. Above the line, the probability increases while below the line, the probability decreases.

But how do we give a very large or small number a value between 0 and 1? We can use the sigmoid function, which does exactly this! We just have to perform the sigmoid function on the sum and there we go! We now have a probability that represents our input in relation to the line.

Example

Let’s do a quick practice problem to solidify your understanding. The example below gives us the equation which a well-trained neural network has produced, and we plug in new inputs into the equation, in order to find the probability of this student getting accepted to MIT.

Let’s say that a student has gotten a math score of 8, and a science score of 9.

Figure 5

This was just an introductory article on how neural networks work, and I didn’t go into how neural network training happens(how we find the line in the first place), but understanding the basics of how they work is the first step! There’s a lot more where this came from!

Ishaana Misra is an 8th grader interested in AI and genomics, especially the intersection of the two. She is also an Innovator at The Knowledge Society.

--

--

Ishaana Misra

Student at Stuyvesant learning about cryptography and Bitcoin.