3 Ways to Calculate Levenshtein Distance in Python

Introduction

Levenshtein Distance, also known as Edit Distance, is used to calculate the difference between two strings. In this article, we will learn how to calculate Levenshtein Distance in Python in three different ways along with examples. First, we shall see its implementation in Python from scratch, then cover two Python libraries pyenchant and Levenshtein that has inbuilt modules to calculate Levenshtein Distance. Before jumping into Python code, let us first take a brief overview of Levenshtein Distance or Edit Distance.

Levenshtein Distance

Levenshtein Distance or Edit Distance is a method to measure the difference between two strings. It also denotes the minimum number of operations required to transform one string to another by performing a combination of the following operations – i) Insertion, ii) Deletion, ii) Substitution.

Levenshtein Distance Examples

Example 1 – Levenshtein distance between “cat” and “cut” is 1 because it requires one operation of {substitution of ‘a’ with ‘u’} to transform “Cat” to “Cut”.

Example 2 – Levenshtein distance between “cat” and “cute” is 2 because it requires two operations of {substitution of “a” with “u”} and {insertion of “e”} to transform “Cat” to “Cute”.

Example 3 – Levenshtein distance between “kitten” and “sitting” is 3 because it requires three operations of {substitution of “k” with “s”}, {substitution of “e” with “i”}, and {insertion of “g”} to transform “kitten” to “sitting”.

Applications of Levenshtein Distance or Edit Distance

Levenshtein Distance or Edit Distance is useful for spell checks, autocorrection, correcting OCR errors, detecting plagiarism, etc. Besides its use in natural language processing, it is also used in bioinformatics to compare DNA sequences and identify mutations or similarities.

Levenshtein Distance in Python

Let us now see how we can implement Levenshtein Distance in Python in three ways.

1. Using Python Code from Scratch

In this approach, we are calculating  Levenshtein Distance by using the dynamic program paradigm where the problem is broken down into smaller sub-problems, it is solved and reused further in the calculation.

In the below code, we have defined a function ‘levenshtein_distance‘ which takes two strings as input. Inside the function, we first initialize a 2D list ‘d’ with dimension (m+1)x(n+1) where ‘m’ and ‘n’ are respective lengths of two strings ‘s’ and ‘t’. Next, we set the values of the first row and column of  ‘d’ to the indices of the corresponding characters of the two strings.

We then iterate over the remaining cells of ‘d’ to calculate the minimum cost of transforming the substring s[0:i] into the substring t[0:j]. If the characters at position ‘i-1’ and ‘j-1’ are the same, then the cost is zero, otherwise, the cost is one.

Next, we calculate the minimum cost by considering three operations of deletion, insertion, and substitution. Finally, we return the value in the bottom-right cell of d which represents the Levenshtein distance between ‘s’ and ‘t’.

In [0]:

def levenshtein_distance(s, t):
    m = len(s)
    n = len(t)
    d = [[0] * (n + 1) for i in range(m + 1)]  

    for i in range(1, m + 1):
        d[i][0] = i

    for j in range(1, n + 1):
        d[0][j] = j
    
    for j in range(1, n + 1):
        for i in range(1, m + 1):
            if s[i - 1] == t[j - 1]:
                cost = 0
            else:
                cost = 1
            d[i][j] = min(d[i - 1][j] + 1,      # deletion
                          d[i][j - 1] + 1,      # insertion
                          d[i - 1][j - 1] + cost) # substitution   

    return d[m][n]

Examples

Below are a couple of examples of the Levenshtein distance results by using our above custom function.

In [1]:

string1 = "algorithm"
string2 = "rhythm"

l_dist = levenshtein_distance(string1, string2)

print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))

Out[1]:

Levenshtein Distance between algorithm & rhythm is 6

In [2]:

string1 = "prime"
string2 = "time"

l_dist = levenshtein_distance(string1, string2)

print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))
Out[2]:
Levenshtein Distance between prime & time is 2

 

2. Using PyEnchant Library

In this section, we will use a library called PyEnchant to calculate Levenshtein Distance or Edit Distance in Python.

Install PyEnchant Library

PyEnchant library can be installed by using the pip command.

In [3]:

pip install pyenchant

Import PyEnchant Library

Next, let us import the library as shown below. If you are installing PyEnchant in Google Colab, you may run into issues while importing the library. For this, you can refer to this solution for proper installation.

In [4]:

import enchant

Examples

Below are a couple of examples of the Levenshtein distance by using the Python PyEnchant library. For this, we are making use of enchant.utils.levenshtein module.

In [5]:

string1 = "algorithm"
string2 = "rhythm"

l_dist = enchant.utils.levenshtein(string1, string2)

print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))
Out[5]:
Levenshtein Distance between algorithm & rhythm is 6
In[6]:
string1 = "prime"
string2 = "time"

l_dist = enchant.utils.levenshtein(string1, string2)

print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))
Out[6]:
Levenshtein Distance between prime & time is 2

3. Using Levenshtein Library

There is another Python library called ‘Levenshtein’ that can be used for calculating edit distance.

Install Levenshtein Library

To install the Python Levenshtein library you can use the pip command as shown below.

In [7]:

pip install Levenshtein

Import Levenshtein Distance Module

Next, we import the distance module from the Levenshtein library.

In [8]:

from Levenshtein import distance

Examples

Below are a couple of examples of using the Levenshtein Python library.

In [9]:

string1 = "algorithm"
string2 = "rhythm"

l_dist = distance(string1, string2)

print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))
Out[9]:
Levenshtein Distance between algorithm & rhythm is 6
In [10]:
string1 = "prime"
string2 = "time"

l_dist = distance(string1, string2)

print("Levenshtein Distance between "+string1+" & "+string2+" is " + str(l_dist))
Out[10]:
Levenshtein Distance between prime & time is 2
References:

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *