NLP - Harry Potter and the Sentiment Scores (Part 1)
I analyzed the texts from the infamous Harry Potter book series using natural language processing (NLP) techniques in Python.
But why? Out of curiosity!
Harry Potter is full of twists - characters appear and disappear, plots change rapidly, and then there are spells and enchantments (natural language of the wizard world I suppose). I wondered if I would be able to find pattern in this something that appears to be so unstructured and incoherent. With that in mind, I performed sentiment analysis, topic modeling and built a content-based recommender system which recommends sections of the book series based on topic words.
In this post, I am going to talk about sentiment analysis (keeping the technical rigors aside). Sentiment analysis is the process of determining whether a piece of text is positive, negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude of a speaker. Sentiments are quantified by two quantities - polarity and subjectivity. Polarity measures the attitude of texts, while subjectivity measures how inherently strong that attitude is.
Show time! Polarity of Book 1.
There are 17 chapters in Book 1. I have broken the chapters of the book into segments, and applied sentiment analysis on each segment. Each color rectangle represents a chapter, and each dot represents a text segment. The plot above shows how rapidly the plots change and how much (sentiment) twist there is.
I read each segment of the book and the polarity scores seem very reasonable. The most positive segment of the book happens to be the Quidditch Commentary by Lee Jordan. And the most negative segment is the story when a troll enters a bathroom in Hogwarts castle, and Harry, Ron and Harmione fight the troll.
Most positive of Book 1
Harry clambered onto his Nimbus Two Thousand. Madam Hooch gave a loud blast on her silver whistle. Fifteen brooms rose up, high, high into the air. They were off. And the Quaffle is taken immediately by Angelina Johnson of Gryffindor – what an excellent Chaser that girl is, and rather attractive, too --JORDAN! "Sorry, Professor.” The Weasley twins' friend, Lee Jordan, was doing the commentary for the match, closely watched by Professor McGonagall.
"And she's really belting along up there, a neat pass to Alicia Spinnet, a good find of Oliver Wood's, last year only a reserve -- back to Johnson and -- no, the Slytherins have taken the Quaffle, SlytherinCaptain Marcus Flint gains the Quaffle and off he goes -- Flint flyinglike an eagle up there -- he's going to sc- no, stopped by an excellent move by Gryffindor Keeper Wood and the Gryffindors take the Quaffle -that's Chaser Katie Bell of Gryffindor there, nice dive around Flint …
Most negative of Book 1
"Oy, pea-brain!" yelled Ron from the other side of the chamber, and he threw a metal pipe at it. The troll didn't even seem to notice the pipe hitting its shoulder, but it heard the yell and paused again, turning its ugly snout toward Ron instead, giving Harry time to run around it. "Come on, run, run!" Harry yelled at Hermione, trying to pull her toward the door, but she couldn't move, she was still flat against the wall, her mouth open with terror. The shouting and the echoes seemed to be driving the troll berserk. It roared again and started toward Ron, who was nearest and had no way to escape. Harry then did something that was both very brave and very stupid: He took a great running jump and managed to fasten his arms around the troll's neck from behind. The troll couldn't feel Harry hanging there, but even a troll will notice if you stick a long bit of wood up its nose, and Harry's wand had still been in his hand when he'd jumped – it had gone straight up one of the troll's nostrils. Howling with pain, the troll twisted and flailed its club, with Harry clinging on for dear life;
One thing I haven’t talked about yet is subjectivity. Subjectivity quantifies the strength of polarity. In a way, subjectivity tells us how reliable the attitude of the text is. Now, lets take a look at polarity and subjectivity together.
Polarity scores range from -1 to +1 (-1: maximally negative, +1: maximally positive and 0: neutral), and subjectivity range from 0 to 1. Subjectivity establishes the strengths of the attitude.
In the plot above, subjectivity shows variations with a tight spread around the neutral line of 0.5. Then what happened in the last segment of Chapter 5? I segmented the book chapters sequentially so that each segment would have roughly 400 words. The last segment of Chapter 5 ended up with 4 words only. I merged such segments with the previous one for sentiment analysis, but in this plot I left it like this as a proof of concept - when text segments are not informative enough subjectivity does not carry much weight. The corresponding polarity of this segment is 0, meaning neutral.
Is there a pattern?
The sentiments of the text segments individually make sense. But there’s no apparent trend; a lot of fluctuations. That makes sense because there are multiple story lines in the book series. That got me thinking - would I see a trend if I follow a particular storyline or character interaction? I did find trends when I followed different topics, stories and character interactions.
The below plot follows the interaction between Harry and Dumbledore through out the book series in sequence.
The most striking trend comes from Book 5. At the beginning of Book 5, Harry starts to misunderstand Dumbledore. His resentment toward Dumbledore grows stronger and hence the downward trend in polarity. This was an aw moment for me. I carefully compared the text and the trends from other sections, and I could follow the upward and downward trends. But visually Book 5 wins!
Tools and techniques
I used TextBlob, a Python library for processing textual data, for sentiments analysis. The package has built-in sentiment classification models, which made it very easy to generate sentiment scores.
In the next part of this post, I will talk about how I did the analysis and share snippets of my codes. Stay tuned!