In this tutorial, we will see how to perform Named Entity Recognition or NER in NLTK library of Python with the help of an example. We will also understand in brief how NER works, why it is used, and finally, do a comparison between POS Tagging vs NER.
So let us get started.
What is Named Entity Recognition?
To understand what is Named Entity Recognition process in NLP, it will be a good starting point to first understand the concept of Named Entity.
i) Named Entity
Named entities are proper nouns that refer to specific entities that can be a person, organization, location, date, etc. Consider this example – “Mount Everest is the tallest mountain”. Here Mount Everest is a named entity of type location as it refers to a specific entity.
Some other examples of named entities are listed below in the table.
|1||ORGANIZATION||SEI, BCCI, Pakistan Cricket Board|
|2||PERSON||Barack Obama, Narendra Modi, Kohli|
|3||MONEY||7 million dollars, INR 7 Crore|
|4||GPE||India, Australia, South East Asia|
|5||LOCATION||Mount Everest, River Nile|
|6||DATE||8th June 1998, 7 April|
|7||TIME||8:45 A.M., two-fifty am|
ii) Named Entity Recognition
In information retrieval and natural language processing, Named Entity Recognition (NER) is the process of extracting Named Entities from the text.
NER is a two steps process, we first perform Part of Speech (POS) tagging on the text, and then using it we extract the named entities based on the information of POS tagging
Uses of Named Entity Recognition
Named Entity Recognition is useful in –
- The field of academics by easy and faster extraction of information for the students and researchers from the searching data.
- In Question Answer system to provide answers from the data by the machine and hence minimizing human efforts.
- In content classification by identifying the theme and subject of the contents and makes the process faster and easy, suggesting the best content of interest.
- Helps in customer service by categorizing the user complaint, request, and question in respective fields and filtering by priority keywords.
- Helps to categories the books and articles in the e-library on different subjects and thus making it organized.
Example of Named Entity Extraction in NLTK
In the below example of named entity recognition in NLTK, we have taken a text from times of India and have applied tokenization and POS tagging to the text.
NLTK provides a function nltk.ne_chunk() that is already a pre-trained classifier to recognize named entity using POS tag as input.
In the output, we can see that the classifier has added category labels such as PERSON, ORGANIZATION, and GPE (geographical physical location) where ever it founded named entity.
import nltk from nltk import word_tokenize,pos_tag text = "NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander." tokens = word_tokenize(text) tag=pos_tag(tokens) print(tag) ne_tree = nltk.ne_chunk(tag) print(ne_tree)
[('NASA', 'NNP'), ('awarded', 'VBD'), ('Elon', 'NNP'), ('Musk', 'NNP'), ('’', 'NNP'), ('s', 'VBD'), ('SpaceX', 'NNP'), ('a', 'DT'), ('$', '$'), ('2.9', 'CD'), ('billion', 'CD'), ('contract', 'NN'), ('to', 'TO'), ('build', 'VB'), ('the', 'DT'), ('lunar', 'NN'), ('lander', 'NN'), ('.', '.')] (S (ORGANIZATION NASA/NNP) awarded/VBD (PERSON Elon/NNP Musk/NNP) ’/NNP s/VBD (ORGANIZATION SpaceX/NNP) a/DT $/$ 2.9/CD billion/CD contract/NN to/TO build/VB the/DT lunar/NN lander/NN ./.)
Let us see one more example where we have used already present tagged sentences provided by the NLTK library.
>>> sent = nltk.corpus.treebank.tagged_sents() >>> print(nltk.ne_chunk(sent))
(S (PERSON Pierre/NNP) (ORGANIZATION Vinken/NNP) ,/, 61/CD years/NNS old/JJ ,/, will/MD join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ./.)
NER using Sapcy (Bonus)
In the example below we have used “token.text, token.entiob, token.enttype” to printed tokens, token’s entity annotations, and the entity types of the token.
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander.") for token in doc: print(token.text, token.ent_iob_, token.ent_type_)
NASA B ORG awarded O Elon B ORG Musk I ORG ’s I ORG SpaceX B CARDINAL a O $ B MONEY 2.9 I MONEY billion I MONEY contract O to O build O the O lunar O lander O .
POS Tagging vs NER
- POS tagging aims at identifying which grammatical group a word belongs to, so whether it is a NOUN, ADJECTIVE, VERB, ADVERBS, etc. whereas on the other hand Named Entity Recognition tries to find out whether or not a word is a named entity. Named entities are persons, locations, organizations, time expressions, etc.
- POS tagger does not look for the relation between the words in the document whereas NER looks for the relationship between words.
- The output of POS tagging is used as an input for NER. Word recognized as a noun by a POS tagger is passed for the NER process.
- POS tagger looks for one word at a time whereas NER looks for multiple words detecting the type of Named Entity, as well as the word boundaries.
- Also Read – Learn Lemmatization in NTLK with Examples
- Also Read – NLTK Tokenize – Complete Tutorial for Beginners
- Also Read – Complete Tutorial for NLTK Stopwords
- Also Read – Beginner’s Guide to Stemming in Python NLTK
- Also Read – Generating Unigram, Bigram, Trigram and Ngrams in NLTK
In this tutorial, we saw examples of how to perform Named Entity Recognition or NER in NLTK library of Python We also understood what are the uses of NER and did a comparison between POS Tagging vs NER.