Named Entity Recognition (NER) in Spacy Library

In this tutorial, we will be covering how to perform Named Entity Recognition (NER) in Spacy Library. We will first understand what is NER and why it is used. Then we will walk through some examples of NER in Spacy and see how to access entity annotations and labels, and set new entity annotations.

What is Named Entity?

Spacy Named Entity Recognition NER

A named entity is a proper noun that refers to a specific entity like location, person, organization, etc. For example, in the sentence “Elon Musk is the owner of Tesla”, Elon Musk and Tesla are named entities.

These are some more examples of named entities –

Named Entity: Examples
1 ORGANIZATION Microsoft, Facebook
2 PERSON Rafael Nadal, Nelson Mandela
3 MONEY 9 million dollars, INR 4 Crore
4 GPE India, Australia, South East Asia
5 LOCATION Mount Everest, River Ganga
6 DATE 9th May 1987, 4 AUG
7 TIME 7:23 A.M., three-forty am

What is Named Entity Recognition (NER)

In NLP, named entity recognition or NER is the process of identifying named entities. NER is useful in areas like information retrieval, content classification, question and answer system, etc.

The operation of named entity recognition is a two-step process – i) First POS (Part of Speech) tagging this done. ii) Based on POS tagging, the named entities are extracted from the text.

NER-Example

Named Entity Recognition (NER) in Spacy

Performing named entity recognition in Spacy is quite fast and easy. The labels or named entities that Spacy library can recognize include companies, locations, organizations, and products. The Spacy model is pre-trained to recognize these entities, however, we can also add our own arbitrary classes to the entity recognition system, and update the model with new examples.

Example 1

In the below example of Spacy NER, we first create a Spacy object and instantiate it with the sample text and assign it to doc variable. The named entities can be simply extracted by iterating over the doc.ent object. In each iteration the entity text is printed by using ent.text and entity label by using ent.label_.

In [1]:
import spacy 
nlp = spacy.load("en_core_web_sm")

doc = nlp("NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander.")
for ent in doc.ents:
    print(ent.text,  ent.label_)
[Out] :
NASA ORG
Elon Musk PERSON
$2.9 billion MONEY

Example 2

This example is also similar to the above example, but just with a different sample text.

In [2]:
import spacy 
nlp = spacy.load("en_core_web_sm")

doc = nlp("Warren Edward Buffett is an American investor, business tycoon, philanthropist, and the chairman and CEO of Berkshire Hathaway.")
for ent in doc.ents:
    print(ent.text,  ent.label_)
[Out] :
Warren Edward Buffett PERSON
American NORP
Berkshire Hathaway PERSON

Spacy NER Lists

We can get the list of ner in Spacy by using nlp.pipe_labels[‘ner’].

In [2]

import spacy

nlp = spacy.load("en_core_web_sm")
ner_lst = nlp.pipe_labels['ner']

print(len(ner_lst))
print(ner_lst)

[Out] :

18
['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']

Accessing Entity Annotations and Labels

The standard way to access the entity annotation in Spacy is by using doc.ents which returns a tuple containing all the entities of the doc. The entity type can be accessed as a hash value or as a string type by using ent.label and ent.label_. By using doc.ents we can get a bunch of information about the entities such as

  • Entity text by using ent.text,
  • Starting and ending character of an entity by using ent.start_char and ent.end_char,
  • Entity’s index by using ent.start,
  • Entity type’s id by using ent.entid,
  • Generate vector norm of an entity by using ent.vector_norm.
In [3]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_, ent.start, ent.ent_id_, ent.label, ent.vector_norm)
[Out] :
Apple 0 5 ORG 0  383 21.796299
U.K. 27 31 GPE 5  384 21.744804
$1 billion 44 54 MONEY 8  394 17.723335

However, we can also access the entity annotation by using the token.ent_iob and token.ent_type attributes. The token.ent_iob returns three tags ‘B’, ‘I’ and ‘O’. ‘B’ means the token begins an entity, ‘I’ means it is inside an entity, ‘O’ means it is outside an entity that is no entity tag is set for this token and will return an empty string “”.

In[5] :

import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("New York is the most populous city in the United States")

for token in doc:
    print(token.text, token.ent_iob_, token.ent_type_)
[Out] :
New B GPE
York I GPE
is O 
the O 
most O 
populous O 
city O 
in O 
the B GPE
United I GPE
States I GPE

Adding New Named Entities in Spacy

The Spacy library has provided a feature to set entity annotation at the document level. However, this can’t be written directly to the token.ent_iob or token.ent_type attributes. Setting entities can be done by various methods listed below.

Method 1 :

Creating a new entity as a span and assigning it to the doc.ents by using doc.set_ents function. Keep in mind that we can set only those entities which are not previously defined. Otherwise can cause an error as “Trying to set conflicting doc.ents”

In the below example, the default Spacy model does not recognize Facebook as an entity. We then create a new span for the Facebook entity and then subsequently it starts recognizing it.

In [21]:
import spacy
from spacy.tokens import Span

nlp = spacy.load("en_core_web_sm")
doc = nlp("facebook was founded by Mark Zuckerberg and his fellow roommates at Harvard College")
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print('Before : ', ents)
# The model didn't recognize 'facebook' as an entity

# Creating a span for the new entity
facebook_ent = Span(doc, 0, 1, label="ORG")
doc.set_ents([facebook_ent], default="unmodified")

#printing the new entity list
ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('After : ', ents)
[Out] :
Before :  [('Mark Zuckerberg', 24, 39, 'PERSON'), ('Harvard College', 68, 83, 'ORG')]
After :  [('facebook', 0, 1, 'ORG'), ('Mark Zuckerberg', 4, 6, 'PERSON'), ('Harvard College', 11, 13, 'ORG')]

Method 2:

We created a new list of a span of entities and concatenated it with the original doc.ents list.

In [18]:
import spacy
from spacy.tokens import Span

nlp = spacy.load("en_core_web_sm")
doc = nlp("facebook was founded by Mark Zuckerberg and his fellow roommates at Harvard College")
ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]
print('Before : ', ents)
# The model didn't recognize 'facebook' as an entity

# Creating a span for the new entity
facebook_ent = Span(doc, 0, 1, label="ORG")

orig_ents = list(doc.ents)
doc.ents = orig_ents + [facebook_ent] 

# Printing the new entity list
ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('After : ', ents)
[Out] :
Before :  [('Mark Zuckerberg', 24, 39, 'PERSON'), ('Harvard College', 68, 83, 'ORG')]
After :  [('facebook', 0, 1, 'ORG'), ('Mark Zuckerberg', 4, 6, 'PERSON'), ('Harvard College', 11, 13, 'ORG')]

Method 3:

We created a NumPy array of zeros of size (length of doc * 2) to store the entity iob and entity type and assigned new entities. In the example, we are assigning “London” and “U.K.” as “GPE”.

In [23]:
import numpy
import spacy
from spacy.attrs import ENT_IOB, ENT_TYPE

nlp = spacy.load("en_core_web_sm")
doc = nlp.make_doc("London is a big city in the U.K.")
ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('Before :', ents) # []

header = [ENT_IOB, ENT_TYPE]
attr_array = numpy.zeros((len(doc), len(header)), dtype="uint64")

attr_array[0, 0] = 3  # B
attr_array[0, 1] = doc.vocab.strings["GPE"]

attr_array[7:, 0] = 3  # B
attr_array[7:, 1] = doc.vocab.strings["GPE"]
doc.from_array(header, attr_array)

ents = [(e.text, e.start, e.end, e.label_) for e in doc.ents]
print('After :', ents)
[Out] :
Before : []
After : [('London', 0, 1, 'GPE'), ('U.K.', 7, 8, 'GPE')]

Visualizing Named Entities in Spacy

We can use the displacy function provided by the spacy library to display a nice visualization of entities of doc objects.

In [19]:
import spacy
from spacy import displacy

text = "When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously."

nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.serve(doc, style="ent")

[Out] :

Spacy NER Visualization

Reference – Spacy Documentation

 

  • Afham Fardeen

    This is Afham Fardeen, who loves the field of Machine Learning and enjoys reading and writing on it. The idea of enabling a machine to learn strikes me.

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *