Contents

## Introduction

Levenshtein Distance, also known as Edit Distance, is used to calculate the difference between two strings. In this article, we will learn how to calculate Levenshtein Distance in Python in three different ways along with examples. First, we shall see its implementation in Python from scratch, then cover two Python libraries pyenchant and Levenshtein that has inbuilt modules to calculate Levenshtein Distance. Before jumping into Python code, let us first take a brief overview of Levenshtein Distance or Edit Distance.

**Levenshtein Distance**

Levenshtein Distance or Edit Distance is a method to measure the difference between two strings. It also denotes the minimum number of operations required to transform one string to another by performing a combination of the following operations – *i) Insertion, ii) Deletion, ii) Substitution.*

**Levenshtein Distance Examples**

**Example 1 –** Levenshtein distance between “cat” and “cut” is 1 because it requires one operation of {substitution of ‘a’ with ‘u’} to transform “Cat” to “Cut”.

**Example 2 –** Levenshtein distance between “cat” and “cute” is 2 because it requires two operations of {substitution of “a” with “u”} and {insertion of “e”} to transform “Cat” to “Cute”.

**Example 3 –** Levenshtein distance between “kitten” and “sitting” is 3 because it requires three operations of {substitution of “k” with “s”}, {substitution of “e” with “i”}, and {insertion of “g”} to transform “kitten” to “sitting”.

**Applications of Levenshtein Distance or Edit Distance**

Levenshtein Distance or Edit Distance is useful for spell checks, autocorrection, correcting OCR errors, detecting plagiarism, etc. Besides its use in natural language processing, it is also used in bioinformatics to compare DNA sequences and identify mutations or similarities.

**Levenshtein Distance in Python**

Let us now see how we can implement Levenshtein Distance in Python in three ways.

**1. Using Python Code from Scratch**

In this approach, we are calculating Levenshtein Distance by using the dynamic program paradigm where the problem is broken down into smaller sub-problems, it is solved and reused further in the calculation.

In the below code, we have defined a function ‘*levenshtein_distance*‘ which takes two strings as input. Inside the function, we first initialize a 2D list ‘d’ with dimension (m+1)x(n+1) where ‘m’ and ‘n’ are respective lengths of two strings ‘s’ and ‘t’. Next, we set the values of the first row and column of ‘d’ to the indices of the corresponding characters of the two strings.

We then iterate over the remaining cells of ‘d’ to calculate the minimum cost of transforming the substring s[0:i] into the substring t[0:j]. If the characters at position ‘i-1’ and ‘j-1’ are the same, then the cost is zero, otherwise, the cost is one.

Next, we calculate the minimum cost by considering three operations of deletion, insertion, and substitution. Finally, we return the value in the bottom-right cell of d which represents the Levenshtein distance between ‘s’ and ‘t’.

In [0]:

def levenshtein_distance(s, t): m = len(s) n = len(t) d = [[0] * (n + 1) for i in range(m + 1)] for i in range(1, m + 1): d[i][0] = i for j in range(1, n + 1): d[0][j] = j for j in range(1, n + 1): for i in range(1, m + 1): if s[i - 1] == t[j - 1]: cost = 0 else: cost = 1 d[i][j] = min(d[i - 1][j] + 1, # deletion d[i][j - 1] + 1, # insertion d[i - 1][j - 1] + cost) # substitution return d[m][n]

**Examples**

Below are a couple of examples of the Levenshtein distance results by using our above custom function.

In [1]:

string1 = "algorithm" string2 = "rhythm" l_dist = levenshtein_distance(string1, string2) print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))

Out[1]:

Levenshtein Distance between algorithm & rhythm is 6

In [2]:

string1 = "prime" string2 = "time" l_dist = levenshtein_distance(string1, string2) print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))

Levenshtein Distance between prime & time is 2

**2. Using PyEnchant Library**

In this section, we will use a library called PyEnchant to calculate Levenshtein Distance or Edit Distance in Python.

**Install PyEnchant Library**

PyEnchant library can be installed by using the pip command.

In [3]:

pip install pyenchant

**Import PyEnchant Library**

Next, let us import the library as shown below. If you are installing PyEnchant in Google Colab, you may run into issues while importing the library. For this, you can refer to this solution for proper installation.

In [4]:

import enchant

**Examples**

Below are a couple of examples of the Levenshtein distance by using the Python PyEnchant library. For this, we are making use of enchant.utils.levenshtein module.

In [5]:

string1 = "algorithm" string2 = "rhythm" l_dist = enchant.utils.levenshtein(string1, string2) print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))

Levenshtein Distance between algorithm & rhythm is 6

string1 = "prime" string2 = "time" l_dist = enchant.utils.levenshtein(string1, string2) print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))

Levenshtein Distance between prime & time is 2

**3. Using Levenshtein Library**

There is another Python library called ‘Levenshtein’ that can be used for calculating edit distance.

**Install Levenshtein Library**

To install the Python Levenshtein library you can use the pip command as shown below.

In [7]:

pip install Levenshtein

**Import Levenshtein Distance Module**

Next, we import the distance module from the Levenshtein library.

In [8]:

from Levenshtein import distance

**Examples**

Below are a couple of examples of using the Levenshtein Python library.

In [9]:

string1 = "algorithm" string2 = "rhythm" l_dist = distance(string1, string2) print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))

Levenshtein Distance between algorithm & rhythm is 6

string1 = "prime" string2 = "time" l_dist = distance(string1, string2) print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))

Levenshtein Distance between prime & time is 2