Beginner’s Guide to Named Entity Recognition (NER) in NLTK Library

Introduction

In this tutorial, we will see how to perform Named Entity Recognition or NER in NLTK library of Python with the help of an example. We will also understand in brief how NER works, why it is used, and finally, do a comparison between POS Tagging vs NER.

So let us get started.

What is Named Entity Recognition?

NLTK NERTo understand what is Named Entity Recognition process in NLP, it will be a good starting point to first understand the concept of Named Entity.

i) Named Entity

Named entities are proper nouns that refer to specific entities that can be a person, organization, location, date, etc. Consider this example – “Mount Everest is the tallest mountain”. Here Mount Everest is a named entity of type location as it refers to a specific entity.

Some other examples of named entities are listed below in the table.

Named Entity: Examples
1 ORGANIZATION SEI, BCCI, Pakistan Cricket Board
2 PERSON Barack Obama, Narendra Modi, Kohli
3 MONEY 7 million dollars, INR 7 Crore
4 GPE India, Australia, South East Asia
5 LOCATION Mount Everest, River Nile
6 DATE 8th June 1998, 7 April
7 TIME 8:45 A.M., two-fifty am

ii) Named Entity Recognition

In information retrieval and natural language processing, Named Entity Recognition (NER) is the process of extracting Named Entities from the text.

NER is a two steps process, we first perform Part of Speech (POS) tagging on the text, and then using it we extract the named entities based on the information of POS tagging

Named Entity Recognition or NER in NLTK Python

Uses of Named Entity Recognition

Named Entity Recognition is useful in –

  • The field of academics by easy and faster extraction of information for the students and researchers from the searching data.
  • In Question Answer system to provide answers from the data by the machine and hence minimizing human efforts.
  • In content classification by identifying the theme and subject of the contents and makes the process faster and easy, suggesting the best content of interest.
  • Helps in customer service by categorizing the user complaint, request, and question in respective fields and filtering by priority keywords.
  • Helps to categories the books and articles in the e-library on different subjects and thus making it organized.

Example of Named Entity Extraction in NLTK

Example -1

In the below example of named entity recognition in NLTK, we have taken a text from times of India and have applied tokenization and POS tagging to the text.

NLTK provides a function nltk.ne_chunk() that is already a pre-trained classifier to recognize named entity using POS tag as input.

In the output, we can see that the classifier has added category labels such as PERSON, ORGANIZATION, and GPE (geographical physical location) where ever it founded named entity.

In [1]:
import nltk
from nltk import word_tokenize,pos_tag

text = "NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander."
tokens = word_tokenize(text)
tag=pos_tag(tokens)
print(tag)

ne_tree = nltk.ne_chunk(tag)
print(ne_tree)
[Out] :
[('NASA', 'NNP'), ('awarded', 'VBD'), ('Elon', 'NNP'), ('Musk', 'NNP'), ('’', 'NNP'), ('s', 'VBD'), ('SpaceX', 'NNP'), ('a', 'DT'), ('$', '$'), ('2.9', 'CD'), ('billion', 'CD'), ('contract', 'NN'), ('to', 'TO'), ('build', 'VB'), ('the', 'DT'), ('lunar', 'NN'), ('lander', 'NN'), ('.', '.')]
(S
  (ORGANIZATION NASA/NNP)
  awarded/VBD
  (PERSON Elon/NNP Musk/NNP)
  ’/NNP
  s/VBD
  (ORGANIZATION SpaceX/NNP)
  a/DT
  $/$
  2.9/CD
  billion/CD
  contract/NN
  to/TO
  build/VB
  the/DT
  lunar/NN
  lander/NN
  ./.)

Example -2

Let us see one more example where we have used already present tagged sentences provided by the NLTK library.

In [2]:
>>> sent = nltk.corpus.treebank.tagged_sents()
>>> print(nltk.ne_chunk(sent[0]))
[Out] :
(S
  (PERSON Pierre/NNP)
  (ORGANIZATION Vinken/NNP)
  ,/,
  61/CD
  years/NNS
  old/JJ
  ,/,
  will/MD
  join/VB
  the/DT
  board/NN
  as/IN
  a/DT
  nonexecutive/JJ
  director/NN
  Nov./NNP
  29/CD
  ./.)

NER using Sapcy (Bonus)

As a bonus, we will also see an example of NER by using Spacy.

In the example below we have used “token.text, token.entiob, token.enttype” to printed tokens, token’s entity annotations, and the entity types of the token.

In [3]:
import spacy 
nlp = spacy.load("en_core_web_sm")

doc = nlp("NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander.")
for token in doc:
    print(token.text, token.ent_iob_, token.ent_type_)
[Out] :
NASA B ORG
awarded O 
Elon B ORG
Musk I ORG
’s I ORG
SpaceX B CARDINAL
a O 
$ B MONEY
2.9 I MONEY
billion I MONEY
contract O 
to O 
build O 
the O 
lunar O 
lander O 
.

POS Tagging vs NER

  • POS tagging aims at identifying which grammatical group a word belongs to, so whether it is a NOUN, ADJECTIVE, VERB, ADVERBS, etc. whereas on the other hand Named Entity Recognition tries to find out whether or not a word is a named entity. Named entities are persons, locations, organizations, time expressions, etc.
  • POS tagger does not look for the relation between the words in the document whereas NER looks for the relationship between words.
  • The output of POS tagging is used as an input for NER. Word recognized as a noun by a POS tagger is passed for the NER process.
  • POS tagger looks for one word at a time whereas NER looks for multiple words detecting the type of Named Entity, as well as the word boundaries.

Conclusion

In this tutorial, we saw examples of how to perform Named Entity Recognition or NER in NLTK library of Python We also understood what are the uses of NER and did a comparison between POS Tagging vs NER.

  • Afham Fardeen

    This is Afham Fardeen, who loves the field of Machine Learning and enjoys reading and writing on it. The idea of enabling a machine to learn strikes me.

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *