In this tutorial, we will be covering how to perform Named Entity Recognition (NER) in Spacy Library. We will first understand what is NER and why it is used. Then we will walk through some examples of NER in Spacy and see how to access entity annotations and labels, and set new entity annotations.
What is Named Entity?
A named entity is a proper noun that refers to a specific entity like location, person, organization, etc. For example, in the sentence “Elon Musk is the owner of Tesla”, Elon Musk and Tesla are named entities.
These are some more examples of named entities –
What is Named Entity Recognition (NER)
In NLP, named entity recognition or NER is the process of identifying named entities. NER is useful in areas like information retrieval, content classification, question and answer system, etc.
The operation of named entity recognition is a two-step process – i) First POS (Part of Speech) tagging this done. ii) Based on POS tagging, the named entities are extracted from the text.
Named Entity Recognition (NER) in Spacy
Performing named entity recognition in Spacy is quite fast and easy. The labels or named entities that Spacy library can recognize include companies, locations, organizations, and products. The Spacy model is pre-trained to recognize these entities, however, we can also add our own arbitrary classes to the entity recognition system, and update the model with new examples.
Example 1
In the below example of Spacy NER, we first create a Spacy object and instantiate it with the sample text and assign it to doc variable. The named entities can be simply extracted by iterating over the doc.ent object. In each iteration the entity text is printed by using ent.text and entity label by using ent.label_.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander.")
for ent in doc.ents:
print(ent.text, ent.label_)
Example 2
This example is also similar to the above example, but just with a different sample text.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Warren Edward Buffett is an American investor, business tycoon, philanthropist, and the chairman and CEO of Berkshire Hathaway.")
for ent in doc.ents:
print(ent.text, ent.label_)
Spacy NER Lists
We can get the list of ner in Spacy by using nlp.pipe_labels[‘ner’].
In [2]
import spacy nlp = spacy.load("en_core_web_sm") ner_lst = nlp.pipe_labels['ner'] print(len(ner_lst)) print(ner_lst)
[Out] :
18 ['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']
Accessing Entity Annotations and Labels
The standard way to access the entity annotation in Spacy is by using doc.ents which returns a tuple containing all the entities of the doc. The entity type can be accessed as a hash value or as a string type by using ent.label and ent.label_. By using doc.ents we can get a bunch of information about the entities such as
- Entity text by using ent.text,
- Starting and ending character of an entity by using ent.start_char and ent.end_char,
- Entity’s index by using ent.start,
- Entity type’s id by using ent.entid,
- Generate vector norm of an entity by using ent.vector_norm.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_, ent.start, ent.ent_id_, ent.label, ent.vector_norm)
However, we can also access the entity annotation by using the token.ent_iob and token.ent_type attributes. The token.ent_iob returns three tags ‘B’, ‘I’ and ‘O’. ‘B’ means the token begins an entity, ‘I’ means it is inside an entity, ‘O’ means it is outside an entity that is no entity tag is set for this token and will return an empty string “”.
In[5] :
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("New York is the most populous city in the United States")
for token in doc:
print(token.text, token.ent_iob_, token.ent_type_)