IT (is) Explained
Information Technology Explained

The Maths behind Neural Networks

Alex Punnen
© All Rights Reserved


Chapter 6: Implementation of a two layered Neural Network

With the derivative of the Cost function dervied from the last chapter, we can code the network

We will use matrices to represent input and weight matrices.

x = np.array(

This is a 4*3 matrix. Note that each row is an input. lets take all this 4 as ‘training set’

y = np.array(

Note you can change the output and try to train the Neural network

This is a 4*1 matrix that represent the expected output. That is for input [0,0,1] the output is [0] and for [0,1,1] the output is [1] etc.

A neural network is implemented as a set of matrices representing the weights of the network.

Let’s create a two layered network. Before that please not the formula for the neural network

So basically the output at layer l is the dot product of the weight matrix of layer l and input of the previous layer.

Now let’s see how the matrix dot product works based on the shape of matrices.

[m*n].[n*x] = [m*x]
[m*x].[x*y] = [m*y]

We take the $[mn]$ as the input matrix this is a $[43]$ matrix.

Similarly the output $y$ is a $[41]$ matrix; so we have $[my] =[4*1]$

So we have


Lets then create our two weight matrices of the above shapes, that represent the two layers of the neural network.

w0 = x
w1 = np.random.random((3,4))
w2 = np.random.random((4,1))

We can have an array of the weights to loop through, but for the time being let’s hard-code these. Note that ‘np’ stands for the popular numpy array library in Python.

We also need to code in our non linearity.We will use the Sigmoid function here.

def sigmoid(x):
    return 1/(1+np.exp(-x))

# derivative of the sigmoid
def derv_sigmoid(x):
   return sigmoid(x)*(1-sigmoid(x))

With this we can have the output of first, second and third layer, using our equation of neural network forward propagation.

a0 = x
a1 = sigmoid(,w1))

a2 = sigmoid(,w2))

a2 is the calculated output from randomly initialized weights. So lets calculate the error by subtracting this from the expected value and taking the MSE.

\[C = \frac{1}{2} \|y-a^l\|^2\]
c0 = ((y-a2)**2)/2

Now we need to use the back-propagation algorithm to calculate how each weight has influenced the error and reduce it proportionally.

We use this to update weights in all the layers and do forward pass again, re-calculate the error and loss, then re-calculate the error gradient $\frac{\partial C}{\partial w}$ and repeat

\[\begin{aligned} w^2 = w^2 - (\frac {\partial C}{\partial w^2} )*learningRate \\ \\ w^1 = w^1 - (\frac {\partial C}{\partial w^1} )*learningRate \end{aligned}\]

Let’s update the weights as per the formula (3) and (5) from last chapter

\[\mathbf{ \frac {\partial C}{\partial w^1} = \sigma'(z^1) * (a^{0})^T*\delta^{2}*w^2.\sigma'(z^2) \quad \rightarrow \mathbb Eq \; (5) }\] \[\delta^2 = (a^2-y)\] \[\mathbf{ \frac {\partial C}{\partial w^2}= \delta^{2}*\sigma^{'}(z^2) * (a^{1})^T \quad \rightarrow \mathbb Eq \; (3) }\]

A Two layered Neural Network in Python

Below is a two layered Network; I have used the code from as the basis. With minor changes to fit into how we derived the equations.

import numpy as np
# seed random numbers to make calculation deterministic 

# pretty print numpy array
np.set_printoptions(formatter={'float': '{: 0.3f}'.format})

# let us code our sigmoid funciton
def sigmoid(x):
    return 1/(1+np.exp(-x))

# let us add a method that takes the derivative of x as well
def derv_sigmoid(x):
   return sigmoid(x)*(1-sigmoid(x))


# Two layered NW. Using from (1) and the equations we derived as explanaionns
# (1)

# set learning rate as 1 for this toy example
learningRate = 1

# input x, also used as the training set here
x = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ])

# desired output for each of the training set above
y = np.array([[0,1,1,0]]).T

# Explanaiton - as long as input has two ones, but not three, ouput is One
Input [0,0,1]  Output = 0
Input [0,1,1]  Output = 1
Input [1,0,1]  Output = 1
Input [1,1,1]  Output = 0

# Randomly initalised weights
weight1 =  np.random.random((3,4)) 
weight2 =  np.random.random((4,1)) 

# Activation to layer 0 is taken as input x
a0 = x

iterations = 1000
for iter in range(0,iterations):

  # Forward pass - Straight Forward
  a1 = sigmoid(z1) 
  a2 = sigmoid(z2) 
  if iter == 0:
    print("Intial Ouput \n",a2)

  # Backward Pass - Backpropagation 
  delta2  = (a2-y)
  # Calcluating change of Cost/Loss wrto weight of 2nd/last layer
  # Eq (A) ---> dC_dw2 = delta2*derv_sigmoid(z2)*a1.T

  dC_dw2_1  = delta2*derv_sigmoid(z2) 
  dC_dw2  =
  # Calcluating change of Cost/Loss wrto weight of 2nd/last layer
  # Eq (B)---> dC_dw1 = derv_sigmoid(z1)*delta2*derv_sigmoid(z2)*weight2*a0.T
  # dC_dw1 = derv_sigmoid(z1)*dC_dw2*weight2_1*a0.T

  dC_dw1 =  np.multiply(dC_dw2_1,weight2.T) * derv_sigmoid(z1)
  # todo - the weight2.T is the only thing not in equation here
  dC_dw1 =

  #Gradinent descent
  weight2 = weight2 - learningRate*(dC_dw2)
  weight1 = weight1 - learningRate*(dC_dw1)

print("New ouput",a2)

# Training is done, weight2 and weight2 are primed for output y

# Lets test out, two ones in input and one zero, ouput should be One
x = np.array([[1,0,1]])
a1 = sigmoid(z1) 
a2 = sigmoid(z2) 
print("Ouput after Training is \n",a2)


Intial Ouput 
 [[ 0.758]
 [ 0.771]
 [ 0.791]
 [ 0.801]]
New ouput [[ 0.028]
 [ 0.925]
 [ 0.925]
 [ 0.090]]
Ouput after Training is 
 [[ 0.925]]

We have trained the NW for getting the output similar to $y$; that is [0,1,0,1]