Contents

## Introduction

While doing your data science or machine learning projects, you would often be required to carry out some statistical operations. In this tutorial, we will cover numpy statistical functions **numpy mean, numpy mode, numpy median and numpy standard deviation**. All of these **statistical functions** help in better understanding of data and also facilitates in deciding what actions should be taken further on data.

### Importing Numpy Library

We will start with the import of numpy library

```
import numpy as np
```

Commencing this tutorial with the **mean** function.

**Numpy Mean : np.mean()**

The **numpy mean** function is used for computing the arithmetic mean of the input values. **Arithmetic mean** is the sum of the elements along the axis divided by the number of elements.

We will now look at the syntax of **numpy.mean()** or **np.mean()**.

### Syntax

**numpy.mean(a, axis=some_value, dtype=some_value, out=some_value, keepdims=some_value)**

**a : array-like** – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

**axis : None or int or tuple of ints** (optional) – This consits of axis or axes along which the means are computed.

**dtype : data-type** (optional) – It is the type used in computing the mean. For integer inputs, the default is **float64**; for floating point inputs, it is the same as the input dtype.

**out : ndarray** (optional) – This is the alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output

**keepdims : bool** (optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray

The output of **numpy mean function** is also an array, if **out=None** then a new array is returned containing the mean values, otherwise a reference to the output array is returned.

### Example 1 : Basic example of np.mean() function

Here we have used a multi-dimensional array to find the mean.

```
a = np.array([[7, 2], [5, 4]])
a
```

```
np.mean(a)
```

### Example 2 : Using ‘axis’ parameter of np.mean() function as ‘0’

In this example, we can see that when the axis value is ‘0’, then mean of **7 and 5** and then mean of **2 and 4** is calculated.

```
np.mean(a, axis=0)
```

### Example 3 : Using ‘axis’ parameter of np.mean() function as ‘1’

When axis value is ‘1’, then mean of **7 and 2** and then mean of **5 and 4** is calculated.

```
np.mean(a, axis=1)
```

### Example 4: Striving for more accurate results

Here we will look how altering **dtype** values helps in achieving more precision in results.

First we have created a 2-D array of zeros with 512*512 values

```
a = np.zeros((2, 512*512), dtype=np.float32)
a
```

**We have used slicing to fill the values in the array in first row and all columns**

```
a[0, :] = 1.0
a
```

**Again slicing is used to fill the values in the second row and all the columns onwards**

```
a[1, :] = 0.1
a
```

```
np.mean(a)
```

**Finding mean through dtype value as float64. The answers are more accurate through this.**

```
np.mean(a, dtype=np.float64)
```

The next statistical function which we’ll learn is **mode for numpy array**.

**Numpy Mode**

One thing which should be noted is that there is **no in-built function for finding mode using any numpy function**. For this, we will use **scipy** library. First we will create numpy array and then we’ll execute the scipy function over the array.

### Syntax

Now we will go over **scipy mode function syntax** and understand how it operates over a numpy array.

**scipy.stats.mode(a, axis=0, nan_policy=’propagate’)**

**a : array-like** – This consists of n-dimensional array of which we have to find mode(s).

**axis – int or None** (optional) – This is the axis along which to operate. Default is 0. If None, computing mode over the whole array **a**

**nan_policy – {‘propagate’, ‘raise’, ‘omit’}** (optional) – This defines how to handle when input contains nan. The following options are available **default is propagate** which returns nan, **raise** throws an error and **omit** performs the calculations ignoring nan values.

As output, two different types of values are produced. First is the **mode** which is of ndarray type and it consists of array of modal values. The second is **count** which is again of ndarray type consisting of array of counts for each mode.

### Example 1: Basic example of finding mode of numpy array

Here we are using default axis value as ‘0’.

```
a = np.array([[7, 1, 1, 7],
[9, 4, 3, 8],
[6, 1, 9, 7],
[9, 7, 2, 5],
[5, 1, 5, 9]])
a
```

In this example, the mode is calculated over columns. This is the reason, we have 4 different values, one for each column. As you can see in the first column **‘9’** is appearing **2** times and thus it is the mode. Similarly, we have **1** as the mode for the second column and **7** as the mode for last i.e. fourth column.

**1**as the mode of third column.

```
from scipy import stats
stats.mode(a)
```

### Example 2 : Putting axis=None in scipy mode function

When we put axis value as **None** in scipy mode function. In this case, mode is calculated for the complete array and this is the reason, **1** is the **mode value** with count as **4**

```
stats.mode(a, axis=None)
```

Continuing our statistical operations tutorial, we will now look at numpy **median** function

**Numpy Median : np.median()**

The **numpy median** function helps in finding the middle value of a sorted array.

### Syntax

**numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)**

**a : array-like** – Input array or object that can be converted to an array, values of this array will be used for finding the median.

**axis : int or sequence of int or None** (optional) – Axis or axes along which the medians are computed. The default is to compute the median along a flattened version of the array.

**out : ndarray** (optional) – This is the alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output

**overwrite_input : bool** (optional) – If True, then allow use of memory of input array a for calculations. The default value is false.

**keepdims – bool** (optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

**Numpy median function** returns a new array holding the result. If the input contains integers or floats smaller than float64, then the output data-type is np.float64. Otherwise, the data-type of the output is the same as that of the input.

### Example 1 : Basic example of np.median() function

When we use the **default value for numpy median function**, the median is computed for flattened version of array. The below array is converted to 1-D array in sorted manner. So the array look like this : [1,5,6,7,8,9]. So the final result is **6.5**.

```
a = np.array([[5, 8, 1], [7, 9, 6]])
a
```

```
np.median(a)
```

### Example 2 : Using ‘axis’ parameter value as ‘0’

Here, with **axis = 0** the median results are of pairs **5 and 7**, **8 and 9** and **1 and 6**.

```
np.median(a, axis=0)
```

### Example 3 : Using ‘axis’ parameter value as ‘1’

For **axis=1**, the median values are obtained through 2 different arrays i.e. **[1,5,8]** and **[6,7,9]**.

```
np.median(a, axis=1)
```

The last statistical function which we’ll cover in this tutorial is **standard deviation**.

**Numpy Standard Deviation : np.std()**

**Numpy standard deviation function** is useful in finding the spread of a distribution of array values. Let’s look at the syntax of numpy.std() to understand about it parameters.

### Syntax

**numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=some_value)**

**a : array-like** – Input array or object that can be converted to an array, values of this array will be used for finding the median.

**axis : int or sequence of int or None** (optional) – Axis or axes along which the medians are computed. The default is to compute the median along a flattened version of the array.

**out : ndarray** (optional) – Alternative output array in which to place the result. It must have the same shape as the expected output.

**ddof : int** (optional) – This means delta degrees of freedom. The divisor used in calculations is **N – ddof**, where N represents the number of elements. By default ddof is zero.

**keepdims – bool** (optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

The **np.std()** returns **standard deviation** in the form of new array if **out** parameter is None, otherwise return a reference to the output array.

### Example 1 : Basic example of np.std() function

In this example, we are using 2-dimensional arrays for finding standard deviation. Here the default value of axis is used, due to this the multidimensional array is converted to flattened array.

```
a = np.array([[7, 9], [8, 4]])
a
```

```
np.std(a)
```

### Example 2: Using axis parameter value as ‘0’

Here the standard deviation is calculated **column-wise**. So the pairs created are **7 and 8** and **9 and 4**.

```
np.std(a, axis=0)
```

### Example 3: Using axis parameter value as ‘1’

Here the standard deviation is calculated **row-wise**. So the pairs created are **7 and 9** and **8 and 4**.

```
np.std(a, axis=1)
```

## Conclusion

Summarizing this article, we looked at different types of statistical operations execution using numpy. We also understood how **numpy mean, numpy mode, numpy median and numpy standard deviation** is used in different scenarios with examples.

*Reference-* https://numpy.org/doc/

- Also Read – Python Numpy Array – A Gentle Introduction to beginners
- Also Read – Tutorial – numpy.arange() , numpy.linspace() , numpy.logspace() in Python
- Also Read – Complete Numpy Random Tutorial – Rand, Randn, Randint, Normal
- Also Read – Tutorial – Numpy Shape, Numpy Reshape and Numpy Transpose in Python