Neural networks are my favourite AI application which is great for you as they’re the most important development in computers since the internet! And I’m about to give you the keys to the neural net kingdom. In a just a few short steps you’re going to build a neural network. I’m going to explain neural networks to you in the same manner I do all my training — by hands-on doing it!
Key this line into your notebook and execute it by pressing <shift+enter> so you get the cursor to drop down to the next cell ready for you to execute the next line.
Did it work or did you get an error? If you got an error like — numpy not found – you need to install numpy. Go back to this page and read how to do that. Don’t get frustrated this early in the game. Be patient; go back and read that page and get it to work. Take breaks if you feel you need to. You need to let your brain internalise all this new information that is coming its way.
Numpy is a very useful library. Click on this link here to have a tab open in your browser with info on numpy in it. Just have a quick look around; don’t bother your head about it too much. You’re going to learn all about numpy in days to come but not in the usual manner of lectures and boring lecturers standing in front of you trying to drum in lines of code but with me teaching you you’re going to learn the HOW of using such tools as numpy and you’re going to start using them right away and give yourself a framework on which to hang the knowledge of these tools as and when you need it — I call it just in time learning!. Then you’re going to learn as you use it those parts that you need and also learn how to get information on those things just at the edge of what you’re using so that you don’t miss out on any functionality that numpy (or any other library you use) can give you to improve your code and your day.
OK. Numpy is working. Now key this in.
… and again press <shift+enter> and the cursor should drop into the empty cell that opens below this one with no error reported. (Once again: if you do get an error please create a comment to this post so we call all jump to your rescue and everyone can learn as a result of your feedback).
What are we doing here? The line which starts with a # symbol is a comment. You’re telling the python interpreter that this line is not code to be executed. This is a comment to yourself; something you’ve put here to remind yourself or explain to yourself what you were thinking as you wrote this. If you come back to this code days or weeks later this comment will jog your memory. As a comment it has no effect on the interpreter and does nothing as code might be expected to do.
Next is the line that begins with def. def is a python keyword, a word reserved for pythons use that tells the interpreter that you are about to define a function. Here the function you’re defining you’ve given the name nonlinear and it takes two parameters x and deriv. deriv is set to False as soon as you go into the function but x has no value. Now read the rest of the function and ask me questions in class or in the comments section below. I can guess that x times 1 minus x won’t give you too much to worry about but what is np.exp?
Remember when we asked python to import numpy and let us use it as np? Well that’s exactly what we’re doing. We’re saying to the interpreter give us the function in numpy that is called exp. We do that by saying np.exp (np dot exp) and we want to give the exp function the value -x (the negative value of x) so it can do its magic on x.
Go to this link on the numpy website now and browse through the list. Take your time. In the search box at the right margin (as in the image I’ve capture below) …
… type in exp and click the <search> button and click on the numpy.exp link – the top or second one will do, it does not matter which one you click on. This is what you’ll get …
Look at the line that says: Calculate the exponential of all elements in the input array. I know: Theres a lot to parse here. What is an exponential? What is an array?Take it slowly. At first don’t worry at all, just key in the code and make sure it works.
This is our “nonlinearity”. There are several kinds of functions we could have used here. The one we’re using is a nonlinearity that maps a function called a “sigmoid”. A sigmoid function maps any value to a value between 0 and 1. We use it here to convert numbers to probabilities. It also has several other desirable properties for training neural networks.
Now key in …
# input dataset X = np.array([ [0,0,1], [0,1,1], [1,0,1], [1,1,1] ])
… and again press <shift+enter>. (from now on I won’t ask you to execute the code. I’ll just assume that you’ll enter and execute it. Remember that you should KEY IT IN and not just cut and paste. Keying in the code has real value in getting your brain to process what you’re doing at a much deeper level than just watching stuff happen like you’re watching a movie).
We’ll talk more about what that code is doing but it’s quite easy … we’re creating an array which will become the data we’re going to input into our neural network to train the network to make predictions.
# output dataset y = np.array([[0,0,1,1]]).T
This is the result we expect the neural net to come up with.
# seed random numbers to make calculation # deterministic (just a good practice) np.random.seed(1)
This is some good practice. We’d get away with not executing this code but it’s just good practice to seed our random generators. Again: We’ll talk much more in class about random number generators. You can google the term if you’re too eager to wait for class.
# initialize weights randomly with mean 0 synapse0 = 2*np.random.random((3,1)) - 1
This is our weight matrix for this neural network. I called it “synapse0” as it feels to me like a synapse – a meeting point – between two biological nerve cells (neurons). Since we only have 2 layers (input and output), we only need one matrix of weights to connect them. Its dimension is (3,1) because we have 3 inputs and 1 output. Another way of looking at it is that layer0 is of size 3 and layer1 is of size 1. Thus, we want to connect every node in layer0 to every node in layer1, which requires a matrix of dimensionality (3,1).
Also notice that it is initialized randomly with a mean of zero. There is quite a bit of theory that goes into weight initialization. For now, just take it as a best practice that it’s a good idea to have a mean of zero in weight initialization.
Another note is that the “neural network” is really just this matrix. We have “layers” layer0 and layer1 but they are transient values based on the dataset. We don’t save them. All of the learning is stored in the synapse0 matrix.
for iter in range(10000): # forward propagation layer0 = X layer1 = nonlinear(np.dot(layer0,synapse0)) # how much did we miss? layer1_error = y - layer1 # multiply how much we missed by the # slope of the sigmoid at the values in layer1 layer1_delta = layer1_error * nonlinear(layer1,True) # update weights synapse0 += np.dot(layer0.T,layer1_delta)
This is the meat of our neural network. Read through the comments first, don’t worry about the code. After you’ve read the comments and got a feel for where we’re going with this go back and read through the code line by line.
Can you now look up on the internet what np.dot is about? Can you see how we’re calling the function nonlinear() that we defined earlier? Can you see how we find the error and store it in layer1_error and push that back in our nonlinear() function storing the result in another variable layer1_delta?
Hey you ‘sneaked in’ the word variable. Yes I did and those are the kinds of tricks I have up my sleeve. Instead of telling you that python as something called variables which are places in memory where you can store a value for use later (yada, yada, yada) I sneak things in as we go and explain them on-the-fly so you REALLY get to see those things in action and thoroughly internalise your learning.
print("Output After Training:") print(layer1)
So finally let’s get some output to see how well our neural net did.
… and this is what I get when I execute that last cell. The result >> 0, 0, 1, 1 – EXACTLY what we wanted to see!
Here’s all the code in one place:
import numpy as np # sigmoid function def nonlinear(x,deriv=False): if(deriv==True): return x*(1-x) return 1/(1+np.exp(-x)) # input dataset X = np.array([ [0,0,1], [0,1,1], [1,0,1], [1,1,1] ]) # output dataset y = np.array([[0,0,1,1]]).T # seed random numbers to make calculation # deterministic (just a good practice) np.random.seed(1) # initialise weights randomly with mean 0 synapse0 = 2*np.random.random((3,1)) - 1 for iter in range(10000): # forward propagation layer0 = X layer1 = nonlinear(np.dot(layer0,synapse0)) # how much did we miss? layer1_error = y - layer1 # multiply how much we missed by the # slope of the sigmoid at the values in layer1 layer1_delta = layer1_error * nonlinear(layer1,True) # update weights synapse0 += np.dot(layer0.T,layer1_delta) print("Output After Training:") print(layer1)