In our machine learning or data science projects when we work with the Pandas library, we majorly focus on the handling of dataframes. In a lot of scenarios apart from the general view of dataframes, we look for alternative views or sometimes have to sort the data. In this tutorial, we will be learning about pandas functions of crosstab(), sample(), and sort_values() that can help us to represent the dataframes in alternative ways for getting for insights.

We will commence this tutorial by importing the pandas library.

import pandas as pd

Pandas Crosstab : crosstab()

By using crosstab() function, we can compute cross-tabulation of two or more different factors.

Syntax

pandas.crosstab(index,columns,values=None,rownames,colnames,dropna)

index : array-like, Series, or list of arrays/Series – These values are used for grouping in the rows.

columns : array-like, Series, or list of arrays/Series – The values are used for grouping in the columns.

values : array-like,optional – The are the array of values used for aggregating.

rownames : sequence,optional – It is used for matching number of row arrays passed.

colnames : sequence,optional – It is used for matching number of column arrays passed.

dropna : bool,default True – This parameter ensures that columns with NaN values are not considered.

The result of this function is a dataframe with the cross-tabulation of data.

Example 1: Simple example of pandas crosstab function

Here three arrays are built and then using pandas crosstab function, we are viewing these arrays in different ways.

import numpy as np

a = np.array(["mango", "mango", "mango", "mango", "orange", "orange",
              "orange", "orange", "mango", "mango", "mango"], dtype=object)

a

array(['mango', 'mango', 'mango', 'mango', 'orange', 'orange', 'orange',
       'orange', 'mango', 'mango', 'mango'], dtype=object)

b = np.array(["one", "one", "one", "two", "one", "one",
               "one", "two", "two", "two", "one"], dtype=object)

b

array(['one', 'one', 'one', 'two', 'one', 'one', 'one', 'two', 'two',
       'two', 'one'], dtype=object)

c = np.array(["watermelon", "watermelon", "strawberry", "watermelon", "watermelon", "strawberry",
               "strawberry", "watermelon", "strawberry", "strawberry", "strawberry"],dtype=object)

c

array(['watermelon', 'watermelon', 'strawberry', 'watermelon',
       'watermelon', 'strawberry', 'strawberry', 'watermelon',
       'strawberry', 'strawberry', 'strawberry'], dtype=object)

As we can see the three arrays are mapped to rows and columns for viewing these arrays in tabular format

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

Example 2: Using Categorical and NaN values with pandas crosstab

Here the pandas crosstab function is used with categorical and NaN values.

first = pd.Categorical(['p', 'q'], categories=['p', 'q', 'r'])

first

[p, q]
Categories (3, object): [p, q, r]

second = pd.Categorical(['x', 'y'], categories=['x', 'y', 'z'])

second

[x, y]
Categories (3, object): [x, y, z]

Since crosstab() function has the default value of dropna parameter as True, “c” and “f” are dropped from the data.

pd.crosstab(first, second)

To get “r” and “z” in the output, dropna parameter is passed “False” value. After this, we can see the two values which were not present earlier.

pd.crosstab(first, second, dropna=False)

[adrotate banner=”3″]

Pandas Sample : sample()

The pandas sample() function is used for returning a random sample of items from an axis of the object.

Syntax

pandas.DataFrame.sample(n,frac,replace,random_state,axis)

n : int,optional – This value specifies the number of items to be returned from the axis of the object.

frac : float,optional – This value tells us the fraction of axis items to return.

replace : bool,default false – By this parameter, the functions come to know whether to allow or disallow sampling of the same row more than once.

random_state : int or numpy.random.RandomState, optional – We can specify a random number generator for fetching desired values.

axis : {0 or ‘index’, 1 or ‘columns’, None}, default None – This is the axis from where sample is taken.

Example 1: Simple example of pandas sample function

We will now look at some examples of pandas sample function, here in this 1st example, after creating a DataFrame, the sample is taken by specifying “n” as 3.

df = pd.DataFrame({'seed_count': [16, 40, 0, 2],
                   'water_content': [20, 50, 10, 30],
                    'quantity': [10, 2, 1, 8]},
                  index=['orange', 'watermelon', 'pineapple', 'apple'])

df

df['water_content'].sample(n=3, random_state=1)

apple        30
pineapple    10
orange       20
Name: water_content, dtype: int64

In this 2nd example, the random and replace parameters are provided and we can clearly see there is a different output. The frac parameter fetches a part of values present on the axis of the object.

df.sample(frac=0.5, replace=True, random_state=1)

Pandas Sort_Values : sort_values()

This function of pandas is used to perform the sorting of values on either axes.

Syntax

pandas.DataFrame.sort_values(by,axis,ascending,inplace,kind,na_position,ignore_index)

by : str or list of str – Here a single list or multiple lists are provided for performing sorting operation.

axis : {0 or ‘index’, 1 or ‘columns’}, default 0 – This is the axis where sorting should take place.

ascending : bool or list of bool, default True – Here the way sorting should be executed is specified. It can be either ascending and descending.

inplace : bool, default False – This parameter ensures that the results are in-place, if specified as true.

kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’ – Here we can choose the kind of sorting which will be performed.

na_position : {‘first’, ‘last’}, default ‘last’ – This parameter either shifts all NaN’s to beginning or at the end.

ignore_index : bool, default False – If passed as true, the original index will be ignored and new index for the object will be provided.

Finally, a DataFrame with sorted values is returned by this function.

Example 1: Simple example of sort_values() function in pandas

After creating a DataFrame, in this example, we are performing the sorting on the column titled “col1”.

df = pd.DataFrame({'col1': ['P', 'Q', 'A', np.nan, 'R', 'C'],
    'col2': [7, np.nan, 9, 2, 8, 5],
     'col3': [2, 9, 7, 9, np.nan, 1],})

df

As we can see the alphabets have been sorted in correct alphabetical order.

df.sort_values(by=['col1'])

Example 2: Using ascending parameter as false in pandas sort

In this example, the ascending parameter is specified with false as the value, so the results are as shown.

 df.sort_values(by='col1', ascending=False)

Example 3: Using na_position parameter in pandas sort

When we use the na_positon parameter, we can shift the NaN values to the starting, different from default value i.e. last.

NOTE – This parameter and other operations of sort_values are applied to only the column specified in by parameter.

df.sort_values(by='col1', ascending=False, na_position='first')

So this is the reason, we can see that now col2 has NaN value at the start.

df.sort_values(by='col2', ascending=False, na_position='first')

Conclusion

We will end this article here. In this tutorial, we have discussed pandas functions which are useful in providing a different view or a subset of a DataFrame. The pandas functions we have learned are crosstab(), sample() and sort_values(), these functions have helped us in viewing the dataframes differently for extracting information.

	col1	col2	col3
0	P	7.0	2.0
1	Q	NaN	9.0
2	A	9.0	7.0
3	NaN	2.0	9.0
4	R	8.0	NaN
5	C	5.0	1.0

	col1	col2	col3
2	A	9.0	7.0
5	C	5.0	1.0
0	P	7.0	2.0
1	Q	NaN	9.0
4	R	8.0	NaN
3	NaN	2.0	9.0

	col1	col2	col3
4	R	8.0	NaN
1	Q	NaN	9.0
0	P	7.0	2.0
5	C	5.0	1.0
2	A	9.0	7.0
3	NaN	2.0	9.0

	col1	col2	col3
3	NaN	2.0	9.0
4	R	8.0	NaN
1	Q	NaN	9.0
0	P	7.0	2.0
5	C	5.0	1.0
2	A	9.0	7.0

	col1	col2	col3
1	Q	NaN	9.0
2	A	9.0	7.0
4	R	8.0	NaN
0	P	7.0	2.0
5	C	5.0	1.0
3	NaN	2.0	9.0

Pandas Tutorial – crosstab(), sample() and sort_values()

Introduction

Importing Pandas Library

Pandas Crosstab : crosstab()

Syntax

Example 1: Simple example of pandas crosstab function

Example 2: Using Categorical and NaN values with pandas crosstab

Pandas Sample : sample()

Syntax

Example 1: Simple example of pandas sample function

Example 2: Using random and replace parameters in pandas sample

Pandas Sort_Values : sort_values()

Syntax

Example 1: Simple example of sort_values() function in pandas

Example 2: Using ascending parameter as false in pandas sort

Example 3: Using na_position parameter in pandas sort

Conclusion

Leave a Reply Cancel reply

Latest Posts

Follow US

b	one		two
c	strawberry	watermelon	strawberry	watermelon
a
mango	2	2	2	1
orange	2	1	0	1

col_0	x	y
row_0
p	1	0
q	0	1

col_0	x	y	z
row_0
p	1	0	0
q	0	1	0
r	0	0	0

	seed_count	water_content	quantity
watermelon	40	50	2
apple	2	30	8