Tutorial on Spacy Part of Speech (POS) Tagging

Tutorial on Spacy Part of Speech or POS Tagging

Introduction

In this article, we will take you through the tutorial for Part of Speech or POS Tagging in Spacy library of Python. We will first understand what is POS tagging and why it is used and finally, see some examples of it in Spacy.

What is POS Tagging?

The Part of speech tagging or POS tagging is the process of marking a word in the text to a particular part of speech based on both its context and definition. In simple language, we can say that POS tagging is the process of identifying a word as nouns, pronouns, verbs, adjectives, etc.

Spacy Part of Speech (POS) Tagging

Why POS tag is used

Some words can function in more than one way when used in different circumstances. The POS Tagging here plays a crucial role to understand in what context the word is used in the sentence. POS Tagging is useful in sentence parsing, information retrieval, sentiment analysis, etc.

POS Tagging in Spacy Library

Spacy provides a bunch of POS tags such as NOUN (noun), PUNCT (punctuation), ADJ(adjective), ADV(adverb), etc. It has a trained pipeline and statistical models which enable spaCy to make classification of which tag or label a token belongs to. For example, a word following “the” in English is most likely a noun.

Ad
Deep Learning Specialization on Coursera

Spacy POS Tags List

Every token is assigned a POS Tag in Spacy from the following list:

POS DESCRIPTION EXAMPLES
ADJ adjective *big, old, green, incomprehensible, first*
ADP adposition *in, to, during*
ADV adverb *very, tomorrow, down, where, there*
AUX auxiliary *is, has (done), will (do), should (do)*
CONJ conjunction *and, or, but*
CCONJ coordinating conjunction *and, or, but*
DET determiner *a, an, the*
INTJ interjection *psst, ouch, bravo, hello*
NOUN noun *girl, cat, tree, air, beauty*
NUM numeral *1, 2017, one, seventy-seven, IV, MMXIV*
PART particle *’s, not,*
PRON pronoun *I, you, he, she, myself, themselves, somebody*
PROPN proper noun *Mary, John, London, NATO, HBO*
PUNCT punctuation *., (, ), ?*
SCONJ subordinating conjunction *if, while, that*
SYM symbol *$, %, §, ©, +, −, ×, ÷, =, :), 😝*
VERB verb *run, runs, running, eat, ate, eating*
X other *sfpksdpsxmsa*
SPACE space

 

Spacy POS Tagging Example

POS Tagging in Spacy library is quite easy as seen in the below example. We just instantiate a Spacy object as doc. We iterate over doc object and use pos_ , tag_, to print the POS tag. Spacy also lets you access the detailed explanation of POS tags by using spacy.explain() function which is also printed in the same iteration along with POS tags.

In [1]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("Get busy living or get busy dying.")

print(f"{'text':{8}} {'POS':{6}} {'TAG':{6}} {'Dep':{6}} {'POS explained':{20}} {'tag explained'} ")
for token in doc:
print(f'{token.text:{8}} {token.pos_:{6}} {token.tag_:{6}} {token.dep_:{6}} {spacy.explain(token.pos_):{20}} {spacy.explain(token.tag_)}')

[Out] :

text     POS    TAG    Dep    POS explained        tag explained 
Get      AUX    VB     ROOT   auxiliary            verb, base form
busy     ADJ    JJ     amod   adjective            adjective
living   NOUN   NN     dobj   noun                 noun, singular or mass
or       CCONJ  CC     cc     coordinating conjunction conjunction, coordinating
get      AUX    VB     conj   auxiliary            verb, base form
busy     ADJ    JJ     acomp  adjective            adjective
dying    VERB   VBG    xcomp  verb                 verb, gerund or present participle
.        PUNCT  .      punct  punctuation          punctuation mark, sentence closer

Fine Grained POS Tag

Spacy also provides a fine-grained tag that further categorizes a token in different sub-categories. For example, when a word is an adjective it further categorizes it as JJR (comparative adjective), JJS (superlative adjective), or AFX (affix adjective). We can get the list of fine grained tags in Spacy by using nlp.pipe_labels[‘tagger’] as shown in the below example.

In [2]

import spacy

nlp = spacy.load("en_core_web_sm")
tag_lst = nlp.pipe_labels['tagger']

print(len(tag_lst))
print(tag_lst)

[Out] :

50
['$', "''", ',', '-LRB-', '-RRB-', '.', ':', 'ADD', 'AFX', 'CC', 'CD', 'DT', 'EX', 'FW', 'HYPH', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NFP', 'NN', 'NNP', 'NNPS', 'NNS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'SYM', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB', 'XX', '_SP', '``']

Fine Grained POS Tag list

Below is a POS tag list, their description, Fine-grained Tag, their description, Morphology, and some examples.

POS POS_Description Fine-grained Tag Description Morphology EXAMPLE
0 ADJ adjective AFX affix Hyph=yes The Flintstones were a **pre**-historic family.
1 ADJ adjective JJ adjective Degree=pos This is a **good** sentence.
2 ADJ adjective JJR adjective, comparative Degree=comp This is a **better** sentence.
3 ADJ adjective JJS adjective, superlative Degree=sup This is the **best** sentence.
4 ADJ adjective PDT predeterminer AdjType=pdt PronType=prn Waking up is **half** the battle.
5 ADJ adjective PRP$ pronoun, possessive PronType=prs Poss=yes **His** arm hurts.
6 ADJ adjective WDT wh-determiner PronType=int rel It’s blue, **which** is odd.
7 ADJ adjective WP$ wh-pronoun, possessive Poss=yes PronType=int rel We don’t know **whose** it is.
8 ADP adposition IN conjunction, subordinating or preposition It arrived **in** a box.
9 ADV adverb EX existential there AdvType=ex **There** is cake.
10 ADV adverb RB adverb Degree=pos He ran **quickly**.
11 ADV adverb RBR adverb, comparative Degree=comp He ran **quicker**.
12 ADV adverb RBS adverb, superlative Degree=sup He ran **fastest**.
13 ADV adverb WRB wh-adverb PronType=int rel **When** was that?
14 CONJ conjunction CC conjunction, coordinating ConjType=coor The balloon popped **and** everyone jumped.
15 DET determiner DT determiner **This** is **a** sentence.
16 INTJ interjection UH interjection **Um**, I don’t know.
17 NOUN noun NN noun, singular or mass Number=sing This is a **sentence**.
18 NOUN noun NNS noun, plural Number=plur These are **words**.
19 NOUN noun WP wh-pronoun, personal PronType=int rel **Who** was that?
20 NUM numeral CD cardinal number NumType=card I want **three** things.
21 PART particle POS possessive ending Poss=yes Fred**’s** name is short.
22 PART particle RP adverb, particle Put it **back**!
23 PART particle TO infinitival to PartType=inf VerbForm=inf I want **to** go.
24 PRON pronoun PRP pronoun, personal PronType=prs **I** want **you** to go.
25 PROPN proper noun NNP noun, proper singular NounType=prop Number=sign **Kilroy** was here.
26 PROPN proper noun NNPS noun, proper plural NounType=prop Number=plur The **Flintstones** were a pre-historic family.
27 PUNCT punctuation -LRB- left round bracket PunctType=brck PunctSide=ini rounded brackets **(**also called parentheses)
28 PUNCT punctuation -RRB- right round bracket PunctType=brck PunctSide=fin rounded brackets (also called parentheses**)**
29 PUNCT punctuation , punctuation mark, comma PunctType=comm I**,**me and myself.
30 PUNCT punctuation : punctuation mark, colon or ellipsis colon **:** is a punctuation mark
31 PUNCT punctuation . punctuation mark, sentence closer PunctType=peri Punctuation at the end of sentence**.**
32 PUNCT punctuation closing quotation mark PunctType=quot PunctSide=fin machine learning**”**
33 PUNCT punctuation “” closing quotation mark PunctType=quot PunctSide=fin **””**
34 PUNCT punctuation opening quotation mark PunctType=quot PunctSide=ini **”**machine learning”
35 PUNCT punctuation HYPH punctuation mark, hyphen PunctType=dash ML site **-** machinelearningknowledge.ai
36 PUNCT punctuation LS list item marker NumType=ord
37 PUNCT punctuation NFP superfluous punctuation
38 SYM symbol # symbol, number sign SymType=numbersign This is hash**#** symbol.
39 SYM symbol $ symbol, currency SymType=currency Dollar **$** is the name of more than 20 curre…
40 SYM symbol SYM symbol this is a symbol **$**
41 VERB verb BES auxiliary “be” Let it **be**.
42 VERB verb HVS forms of “have” I**’ve** seen the Queen
43 VERB verb MD verb, modal auxiliary VerbType=mod This **could** work.
44 VERB verb VB verb, base form VerbForm=inf I want to **go**.
45 VERB verb VBD verb, past tense VerbForm=fin Tense=past This **was** a sentence.
46 VERB verb VBG verb, gerund or present participle VerbForm=part Tense=pres Aspect=prog I am **going**.
47 VERB verb VBN verb, past participle VerbForm=part Tense=past Aspect=perf The treasure was **lost**.
48 VERB verb VBP verb, non-3rd person singular present VerbForm=fin Tense=pres I **want** to go.
49 VERB verb VBZ verb, 3rd person singular present VerbForm=fin Tense=pres Number=sing Person=3 He **wants** to go.
50 X other ADD email [email protected]
51 X other FW foreign word Foreign=yes Hello in spanish is **Hola**
52 X other GW additional word in multi-word expression
53 X other XX unknown
54 SPACE space _SP space
55 NIL missing tag

Morphology

In linguistics, morphology is defined as the process of analyzing a word, how they are formed, and their relationship to other words in the same language, the structure of words, and parts of words such as stems, root words, prefixes, and suffixes. Morphology also looks at parts of speech, intonation, and stress, and the ways of context can change a word’s pronunciation and meaning.

Spacy uses the token text and fine-grained part-of-speech tags to produce morphological features.

In Spacy, the morphological features are stored in the MorphAnalysis under Token.morph, which allows us to access individual morphological features. In the example below, we are iterating the tokens of doc object and printing all the morphological features by using token.morph attributes. However, we can also access any particular type of morphological features by using morph.get() function. token.morph.to_dict() function returns all the morphological features in a dictionary format.

In [2]:
import spacy

nlp = spacy.load("en_core_web_sm")
print("Pipeline:", nlp.pipe_names)
doc = nlp("I was heading towards North.")
for token in doc:  
    print(token.text)
    print(token.morph)   ## Printing all the morphological features.
    print(token.morph.get("Number"))   ## Printing a particular type of morphological 
                                       ## features such as Number(Singular or plural).
    print(token.morph.to_dict())       ## Prining the morphological features in dictionary format.
    print('\n\n')

[Out] :

Pipeline: ['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']
I
Case=Nom|Number=Sing|Person=1|PronType=Prs
['Sing']
{'Case': 'Nom', 'Number': 'Sing', 'Person': '1', 'PronType': 'Prs'}

was
Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin
['Sing']
{'Mood': 'Ind', 'Number': 'Sing', 'Person': '3', 'Tense': 'Past', 'VerbForm': 'Fin'}

heading
Aspect=Prog|Tense=Pres|VerbForm=Part
[]
{'Aspect': 'Prog', 'Tense': 'Pres', 'VerbForm': 'Part'}

towards
[]
{}

North
NounType=Prop|Number=Sing
['Sing']
{'NounType': 'Prop', 'Number': 'Sing'}

.
PunctType=Peri
[]
{'PunctType': 'Peri'}

Counting POS Tags in Spacy

In the example below, we are passing the POS token attribute to Doc.count() function which returns a frequency dictionary with key as POS attribute value and its frequency as the value. With the help of for loop, we are printing the POS tag and its count.

In [3]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"Don't be afraid to give up the good to go for the great")

# Counting the frequencies of different POS tags:
POS_counts = doc.count_by(spacy.attrs.POS)
print(POS_counts)

for k,v in sorted(POS_counts.items()):
    print(f'{k:{4}}. {doc.vocab[k].text:{5}}: {v}')
[Out] :
{87: 2, 94: 3, 84: 2, 100: 2, 85: 2, 90: 2, 92: 1}
  84. ADJ  : 2
  85. ADP  : 2
  87. AUX  : 2
  90. DET  : 2
  92. NOUN : 1
  94. PART : 3
 100. VERB : 2

Counting fine-grained tags

In the example below, we are passing the TAG token attribute to Doc.count() and it is returning a frequency dictionary with key as TAG attribute value and its frequency as the value. With the help of for loop, we are printing the POS tag and its count.

In [4]:
import spacy

nlp = spacy.load("en_core_web_sm")
doc = nlp(u"The quick brown fox jumped over the lazy dog's back.")

# Counting the frequencies of different fine-grained tags:
TAG_counts = doc.count_by(spacy.attrs.TAG)

print(TAG_counts)
for k,v in sorted(TAG_counts.items()):
    print(f'{k}. {doc.vocab[k].text:{4}}: {v}')
[Out] :
{15267657372422890137: 2, 10554686591937588953: 3, 15308085513773655218: 3, 17109001835818727656: 1, 1292078113972184607: 1, 74: 1, 12646065887601541794: 1}
74. POS : 1
1292078113972184607. IN  : 1
10554686591937588953. JJ  : 3
12646065887601541794. .   : 1
15267657372422890137. DT  : 2
15308085513773655218. NN  : 3
17109001835818727656. VBD : 1

Visualizing the POS Tags in Spacy

In Spacy we can visualize the part-of-speech tags and syntactic dependencies using displacy.serve() function which takes a single Doc or list of Doc objects and returns a nice visualization.

In [6 ]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("You only live once, but if you do it right, once is enough.")
displacy.serve(doc, style="dep")
[Out] :
However, We can have different visualization by tuning the display function by passing a list of parameters whose significance is explained below.

Parameters

distance : Distance between token dipendencies.
compact : Compactness of color.
color : Color of the font.
bg : Background color of the visualization.
font : Style of the font in the visualization.

In [7]:

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("I was reading the paper")
options = {'distance': 110, 'compact': 'True', 'color': 'yellow', 'bg': '#09a3d5', 'font': 'Times'}

displacy.serve(doc, style="dep",options=options)
[Out] :
Spacy POS Tagging

Visualizing POS Tags in Long Texts in Spacy

Long texts can become difficult to read when displayed in one row, so it’s often better to visualize them sentence-by-sentence instead. Displacy supports rendering both Doc and Span objects, as well as lists of Docs or Spans. Instead of passing the full Doc to displacy.serve, we can also pass in a list doc.sents. This will create one visualization for each sentence.

In [8]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
text = "Life is a beautiful journey that is meant to be embraced to the fullest every day.However, that doesn’t mean you always wake up ready to seize the day, and sometimes need a reminder that life is a great gift."
doc = nlp(text)
sentence_spans = list(doc.sents)
displacy.serve(sentence_spans, style="dep")

[Out] :

Reference – Spacy Documentation

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here