Beginner’s Guide to Named Entity Recognition (NER) in NLTK Library

Afham Fardeen
Last Updated On June 3, 2021
Natural Language Processing

Table of Contents

Introduction

In this tutorial, we will see how to perform Named Entity Recognition or NER in NLTK library of Python with the help of an example. We will also understand in brief how NER works, why it is used, and finally, do a comparison between POS Tagging vs NER.

So let us get started.

What is Named Entity Recognition?

NLTK NER To understand what is Named Entity Recognition process in NLP, it will be a good starting point to first understand the concept of Named Entity.

i) Named Entity

Named entities are proper nouns that refer to specific entities that can be a person, organization, location, date, etc. Consider this example – “Mount Everest is the tallest mountain”. Here Mount Everest is a named entity of type location as it refers to a specific entity.

Some other examples of named entities are listed below in the table.

	Named Entity:	Examples
1	ORGANIZATION	SEI, BCCI, Pakistan Cricket Board
2	PERSON	Barack Obama, Narendra Modi, Kohli
3	MONEY	7 million dollars, INR 7 Crore
4	GPE	India, Australia, South East Asia
5	LOCATION	Mount Everest, River Nile
6	DATE	8th June 1998, 7 April
7	TIME	8:45 A.M., two-fifty am

ii) Named Entity Recognition

In information retrieval and natural language processing, Named Entity Recognition (NER) is the process of extracting Named Entities from the text.

NER is a two steps process, we first perform Part of Speech (POS) tagging on the text, and then using it we extract the named entities based on the information of POS tagging

Named Entity Recognition or NER in NLTK Python

Uses of Named Entity Recognition

Named Entity Recognition is useful in –

The field of academics by easy and faster extraction of information for the students and researchers from the searching data.

In Question Answer system to provide answers from the data by the machine and hence minimizing human efforts.
In content classification by identifying the theme and subject of the contents and makes the process faster and easy, suggesting the best content of interest.
Helps in customer service by categorizing the user complaint, request, and question in respective fields and filtering by priority keywords.

Helps to categories the books and articles in the e-library on different subjects and thus making it organized.

Example of Named Entity Extraction in NLTK

Example -1

In the below example of named entity recognition in NLTK, we have taken a text from times of India and have applied tokenization and POS tagging to the text.

NLTK provides a function nltk.ne_chunk() that is already a pre-trained classifier to recognize named entity using POS tag as input.

In the output, we can see that the classifier has added category labels such as PERSON, ORGANIZATION, and GPE (geographical physical location) where ever it founded named entity.

In [1]:

import nltk
from nltk import word_tokenize,pos_tag

text = "NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander."
tokens = word_tokenize(text)
tag=pos_tag(tokens)
print(tag)

ne_tree = nltk.ne_chunk(tag)
print(ne_tree)

[Out] :

[('NASA', 'NNP'), ('awarded', 'VBD'), ('Elon', 'NNP'), ('Musk', 'NNP'), ('’', 'NNP'), ('s', 'VBD'), ('SpaceX', 'NNP'), ('a', 'DT'), ('$', '$'), ('2.9', 'CD'), ('billion', 'CD'), ('contract', 'NN'), ('to', 'TO'), ('build', 'VB'), ('the', 'DT'), ('lunar', 'NN'), ('lander', 'NN'), ('.', '.')]
(S
  (ORGANIZATION NASA/NNP)
  awarded/VBD
  (PERSON Elon/NNP Musk/NNP)
  ’/NNP
  s/VBD
  (ORGANIZATION SpaceX/NNP)
  a/DT
  $/$
  2.9/CD
  billion/CD
  contract/NN
  to/TO
  build/VB
  the/DT
  lunar/NN
  lander/NN
  ./.)

Example -2

Let us see one more example where we have used already present tagged sentences provided by the NLTK library.

In [2]:

>>> sent = nltk.corpus.treebank.tagged_sents()
>>> print(nltk.ne_chunk(sent[0]))

[Out] :

(S
  (PERSON Pierre/NNP)
  (ORGANIZATION Vinken/NNP)
  ,/,
  61/CD
  years/NNS
  old/JJ
  ,/,
  will/MD
  join/VB
  the/DT
  board/NN
  as/IN
  a/DT
  nonexecutive/JJ
  director/NN
  Nov./NNP
  29/CD
  ./.)

NER using Sapcy (Bonus)

As a bonus, we will also see an example of NER by using Spacy.

In the example below we have used “token.text, token.entiob, token.enttype” to printed tokens, token’s entity annotations, and the entity types of the token.

In [3]:

import spacy 
nlp = spacy.load("en_core_web_sm")

doc = nlp("NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander.")
for token in doc:
    print(token.text, token.ent_iob_, token.ent_type_)

[Out] :

NASA B ORG
awarded O 
Elon B ORG
Musk I ORG
’s I ORG
SpaceX B CARDINAL
a O 
$ B MONEY
2.9 I MONEY
billion I MONEY
contract O 
to O 
build O 
the O 
lunar O 
lander O 
.

POS Tagging vs NER

POS tagging aims at identifying which grammatical group a word belongs to, so whether it is a NOUN, ADJECTIVE, VERB, ADVERBS, etc. whereas on the other hand Named Entity Recognition tries to find out whether or not a word is a named entity. Named entities are persons, locations, organizations, time expressions, etc.
POS tagger does not look for the relation between the words in the document whereas NER looks for the relationship between words.
The output of POS tagging is used as an input for NER. Word recognized as a noun by a POS tagger is passed for the NER process.

POS tagger looks for one word at a time whereas NER looks for multiple words detecting the type of Named Entity, as well as the word boundaries.

Also Read – Learn Lemmatization in NTLK with Examples
Also Read – NLTK Tokenize – Complete Tutorial for Beginners

Also Read – Complete Tutorial for NLTK Stopwords
Also Read – Beginner’s Guide to Stemming in Python NLTK
Also Read – Generating Unigram, Bigram, Trigram and Ngrams in NLTK

Conclusion

In this tutorial, we saw examples of how to perform Named Entity Recognition or NER in NLTK library of Python We also understood what are the uses of NER and did a comparison between POS Tagging vs NER.

Afham Fardeen

This is Afham Fardeen, who loves the field of Machine Learning and enjoys reading and writing on it. The idea of enabling a machine to learn strikes me.
View all posts

Tags: Natural Language Processing, NLP, NLTK, python

Beginner’s Guide to Named Entity Recognition (NER) in NLTK Library

Introduction

What is Named Entity Recognition?

i) Named Entity

ii) Named Entity Recognition

Uses of Named Entity Recognition

Example of Named Entity Extraction in NLTK

Example -1

Example -2

NER using Sapcy (Bonus)

POS Tagging vs NER

Conclusion

Leave a Reply Cancel reply

Related Posts

Transformers vs RNN – A Detailed Comparison

Introduction Tutorial to Hugging Face Datasets Library

3 Ways to Calculate Levenshtein Distance in Python

Word2Vec in Gensim Explained for Creating Word Embedding Models (Pretrained and Custom)

Tutorial on Spacy Part of Speech (POS) Tagging

Named Entity Recognition (NER) in Spacy Library

Follow US