Contents

## Introduction

Welcome to final part 4 of our series **Neural Network Primitives** where we had been exploring the primitive forms of artificial neuron network right from it’s historical roots.

We started this journey with McCulloch Pitts Neuron (1943) in part 1, then we moved to Perceptron (1957) in part 2 and then saw Sigmoid Neuron in part 3. As we progressed from part 1 to part 3 we saw that how each neuron improved on it’s predecessor.

In this final part, we will summarize the first 3 parts and understand how all these predecessor neurons built up to the artificial neuron that are used today in modern deep learning. I strongly suggest to read the previous 3 parts first to retain the continuity and understand this final part more better.

- Read More- Neural Network Primitives Part 1 – McCulloch Pitts Neuron Model (1943)
- Read More- Neural Network Primitives Part 2 – Perceptron Model (1957)
- Read More – Neural Network Primitives Part 3 – Sigmoid Neuron

## Summary of historical neurons so far…

So we saw three different types of artificial neurons in our previous three parts. Let us quickly summarize each of them. Do notice that how the neuron’s characteristics evolved throughout.

### McCulloch Pitts Neuron (1943)

This was the first artificial neuron ever conceptualized way back in 1943 and was actually an attempt to represent biological neuron with a mathematical model. It used step function as activation function.

The input and output are both binary in nature. The input does not have weights associated with it. It is not a true machine learning model because it did not have any weights to learn. The threshold value of step function is instead calculated manually. It works only with linear data and cannot work with non linear data.

### Perceptron (1957)

Perceptron also uses step activation function but has some improvements over McCulloch Pitts Neuron.

The inputs can be any real number and has weights associated with it. The weights value can be learnt with training data and so it is a true machine learning model.

Since it uses step activation function the output is still binary 0 or 1. Also because of step activation function, there is a sudden change in decision from 0 to 1 at threshold value. This sudden change may not be appreciated in real world problem. It still cannot work with non-linear data.

### Sigmoid Neuron

This neuron uses sigmoid function for activation and is further improvement over Perceptron.

By nature of sigmoid function, there is a smooth transition from 0 to 1. There is no sudden change from 0 to 1 like step function, which was an issue with previous neurons. And because of this sigmoid function, the output is not binary but ranges from 0 to 1. Sigmoid function is a non linear function so this neuron can work with non linear data up to certain extend.

*It is interesting to note here that Sigmoid neuron actually creates a major improvement over Perceptron by just changing the activation function from Step function to Sigmoid function.*

## Is Sigmoid Neuron the best artificial neuron ?

If we see above comparison, we will see that Sigmoid Neuron actually comes out winner between the three neurons. So should we say that sigmoid neuron is the best artificial neuron to work with in neural networks. Well not really !!

Neural networks contains multiple artificial neurons, in fact deep neural network can be stacked with hundreds of artificial neurons. The weights of so many artificial neurons in the network are trained using Back Propagation algorithm which uses Gradient Descent technique at it’s core.

The problem with sigmoid activation function is that it suffers from the peculiar problem of Vanishing Gradient where the changes in gradients becomes so small that the weights are updated very slowly or not updated at all. The network effectively stops learning.

Because of this issue, the Sigmoid Neuron has lost popularity over the years and is rarely used in modern deep learning era. At most, it is used in the output layer.

## Modern Artificial Neuron

First of all let me tell you that by modern artificial neuron, I actually mean neurons which are used in this modern deep learning era. So what are these modern neurons and how are they different from the three neurons that we had been discussing in this series ? First let me make you understand about DNA of artificial neurons.

### Activation function is DNA of artificial neurons

Activation function is like a DNA which defines the characteristics of a neuron. If you remember above, the only difference between Perceptron and Sigmoid Neuron was the activation function. Apart from this difference the complete architecture between the two was same. And it was just this change in activation function that resulted in a more better Sigmoid Neuron over Perceptron.

Activation functions are responsible for the output of the artificial neuron. This can be visualized as, activation function decides whether the neuron should fire or not or with what intensity it should fire.

Over the course of so many years many activation functions have been used. Some gained popularity, some are not used. In our series so far, we only saw two types of activation functions – Step function and Sigmoid function that were used in the early artificial neurons. Below are some other activation functions that can be used in artificial neurons.

- Tanh function
- Relu function
- Leaky Relu function
- Softmax function

Read More- Animated guide to Activation Functions in Neural Network

Remember that, if we change the activation function of a Perceptron from step function to sigmoid function, it becomes Sigmoid Neuron. So basically, the artificial neurons all have common architecture and only vary with their activation function.

### Anatomy of Artificial Neuron

So let us understand the anatomy of artificial neuron in more details.

Below are various parts of Artificial Neuron–

- Inputs
- Bias
- Weights
- Neuron
- Output

**Inputs**

The inputs to artificial neuron can represent any real number.

**Bias**

Bias of a neuron, is an additional learning parameter which helps to generate offset in the activation function. Imagine this like, bias helps to move the graph of activation function across to better fit data. Bias is not dependent on the input.

**Weights**

Each input has a weight associated with it in the artificial neuron. These weights are initially unknown but are learnt during training phase.

For ease of mathematical representation, bias is considered as weight whose input value is 1.

**Neuron**

Neuron is a computational unit which has incoming input signals. The input signals are computed and an output is fired. The neuron further consists of following two elements –

##### Summation Function

This simply calculates the sum of incoming inputs multiplied by it’s respective weights.

##### Activation Function

Here the activation function can be any function that we discussed in the earlier section above. The activation function determines the nature of output and characteristics of neuron.

**Output**

This is simply the output of the neuron whose range of output is determined by the activation function that is being used in the neuron.

## How Artificial Neuron works

After understanding the anatomy, let us now see how artificial neuron (which is already trained) works.

- The input data is enabled into the artificial neuron.
- Apply dot product between input & it’s respective weights and sum them up.
- Apply activation function on above summation which produce an output.

## In the End…

So here we are at the end of our four part series **Neural Network Primitives**. The main intention of this series was to make people aware about the artificial neurons and their history. Artificial neurons are the building blocks of the neural networks and deep learning. So it is very important to learn about these primitive elements first, before diving into the ocean of deep learning.

If you have not read the previous 3 parts of the series, I strongly suggest to read them to build a strong foundation.

- Read More- Neural Network Primitives Part 1 – McCulloch Pitts Neuron Model (1943)
- Read More- Neural Network Primitives Part 2 – Perceptron Model (1957)
- Read More – Neural Network Primitives Part 3 – Sigmoid Neuron

Do share your feed back about this post in the comments section below. If you found this post informative, then please do share this and subscribe to us by clicking on bell icon for quick notifications of new upcoming posts. And yes, don’t forget to join our new community MLK Hub and Make AI Simple together.