The Netflix Black Mirror TV series by Charlie Brooker is undoubtedly one of my favorite shows on TV. It’s fascinating to see how Charlie Brooker envisions things going wrong if ever technology advances to the point wherein our morals as humans are challenged.
Now, it’s not really a secret that the Netflix Black Mirror TV series is super downbeat – Charlie Brooker’s characters are usually broken people, choices for music are ominous, and the world it builds is generally gray and dreary. Since Netflix Black Mirror explores the dark side of modern tech, that’s to be expected. It was really hard for me to binge-watch the show, since certain aspects are really disturbing, but I couldn’t really keep my eyes off the TV.
Browsing the Netflix Black Mirror TV series thread on Reddit, I saw a lot of threads asking people to rank each Black Mirror episode according to how disturbing or depressing it is. Apart from ‘how disturbing’ or ‘how depressing’ the Black Mirror episodes are, another interesting thing to look at is the sentiment or mood of the episodes – how positive or negative are they? If you were to rank Netflix Black Mirror TV series episodes according to how downbeat and negative they are, what would the ranking look like? Obviously, rankings are subjective and differ from person to person; and this is precisely what’s interesting about rankings. Consequently, this made me think: Can I use the tools of data science and text analytics, specifically text sentiment analysis, to obtain an objective ranking of the (inherently subjective) sentiment of Netflix Black Mirror TV series episodes? And if so, will I be able to justify the resulting ranking?
Text Sentiment Analysis of Netflix Black Mirror
In this post, I utilize sentiment analysis in Python in order to analyze Netflix Black Mirror TV series episode subtitles (i.e. character dialog) and quantitatively assess their sentiment. From this analysis, I will be able to quantify a sentiment score and rank the Black Mirror episodes according to how negative they are. In actuality, since I’m analyzing dialog, what I’m actually dealing with is the sentiment of Black Mirror characters. By solely considering character dialog, I’m implicitly assuming that the sentiment score of the episode is the sentiment score of the characters. There are a lot of factors not considering in this text analytics exercise, such as music and te visuals.
I perform sentiment analysis in Python in this article. Python is a really easy programming language to use for text analytics and data science since it is chockfull of readily-usable software packages. The full code and iPython notebook can be accessed on my Github.
For text sentiment analysis, I am going to focus on a specific lexical approach called VADER (Valence Aware Dictionary and sEntiment Reasoner), a lexicon and rule-based text sentiment analysis tool. The nice thing about VADER is that it returns a sentiment score in the range -1 to 1, that is, most negative to most positive. This will enable us to rank different texts according to their sentiment intensity.
Conveniently, VADER is an open-source tool for sentiment analysis in Python. Check out the GitHub repo of Hutto for the complete implementation.
For an explanation of how VADER sentiment analysis tool calculates the sentiment score, check out my other article, VADER Sentiment Analysis Explained.
Let’s now turn our attention to the data for this text analytics exercise – what is it, where do we get it, and how can we analyze it?
Text Analytics: Data Extraction and Processing
For the data source, I used subtitle (.srt) files from http://tvsubs.net. Each file is a transcript of each episode of Netflix Black Mirror TV series. For each subtitle file, I did some pre-processing in order to return a single string which contains the whole dialog of the Black Mirror episode as a paragraph. Note that each subtitle file has time stamps, and so we need to remove these.
First, I loaded some dependencies, which include text analytics functions as well as Python tools pandas and matplotlib. Then, I wrote a function (return_dialog) which takes as input the string name of the subtitle file and removes the time stamps to output the dialog as a single string.
Next, I tried to obtain the VADER sentiment score of each subtitle file considered as a single piece of text. This didn’t work that well since the returning sentiment scores were around -1 and 1, which are the minimum and maximum values. It seems that VADER doesn’t really work that well when evaluating the sentiment of very large texts.
Applying VADER to the whole dialog doesn’t really work out. Because of this, I resorted to applying VADER to each sentence. To split the paragraph into a list of sentences, I used the NLTK sentence tokenizer tokenize.sent_tokenize(). The sentiment score of the whole dialog, in this case, is the average of the sentiment scores of its individual sentences. To make analysis easier, I built a pandas data frame of sentences, wherein the column data consist of the episode name, subtitle file name, sentiment score, and whether it is a positively labeled sentence or a negatively labeled sentence. A positively labeled sentence is defined as one which has a positive VADER sentiment score.
Note that there are a lot of sentences without sentiment, i.e. having sentiment score zero. Including these causes the mean polarity to be almost zero. I decided to ignore non-sentiment-bearing sentence in the analysis.
From the data frame, we can easily plot the data for each episode. The following code plots a swarmplot and a violinplot for each Netflix Black Mirror TV series episode in Season 1. The same can be done for other seasons of Charlie Brooker’s saga.
Black Mirror Sentiment Score Distributions
After constructing a data frame of sentences per episode and their associated VADER sentiment scores, we can use the Seaborn library in Python to visualize the sentiment distribution, the distribution of sentiment scores, per episode. Below are plots that show just that. These plots are actually two superimposed plots – a swarmplot (aka the dots) and a violinplot (aka the smooth outer layer).
Swarmplots are similar to histograms in the way that they show the distribution of scores. The larger the number of dots corresponding to a sentiment score, the higher the number of sentences with that sentiment score. Violinplots, on the other hand, show the kernel density estimate (KDE) of the data. Intuitively, this is a smooth estimate of the underlying distribution of the data.
Note that the KDE plots actually extend beyond the -1 to 1 range. This is because KDE does not recognize the sentiment range; its aim is to smooth out the data and approximate the underlying distribution. Actually, I only drew the KDE in order to make the shapes of the sentiment distributions more apparent.
After admiring how cute the plots are, we can see that all of them show a bimodal distribution – one peak at a positive score and another at a negative score. What is interesting to see is which peak actually dominates – the positive or the negative? For some episodes, like The Entire History of You, Nosedive, and the Waldo Moment, the positive peak clearly dominates; while for some, like The National Anthem and Hated in the Nation, it’s less obvious.
We can quantitatively compare the episodes by comparing the means of the underlying sentiment distributions. We consider three different means:
- the mean sentiment score of the positively labeled sentences (MP)
- the mean sentiment score of the negatively labeled sentences (MN)
- the mean of the sentiment distribution (MA)
The mean sentiment score of the positively labeled sentences (MP) only considers the positively labeled sentences, that is, the sentences which have VADER polarity that is positive. MP quantifies the intensity of the positive sentences in the script, without regard for the negative ones. The interpretation of the mean sentiment score of the negatively labeled sentences (MN) is analogous to this, with positive and negative interchanged.
On the other hand, the mean of the sentiment distribution (MA) is the mean polarity of all sentiment-bearing sentences, which includes both positive and negative. Note that when calculating MA, we are adding a bunch of positive quantities and negative quantities, so we most likely end up with a number near zero, if the positive and negative are evenly matched. We can interpret MA as a quantity which characterizes the overall sentiment of the episode – with the sign containing information on which sentiment, positive or negative, prevailed.
Again, as I’ve mentioned earlier, we are using the sentiment of character dialog as a proxy for the sentiment of the episode. A lot of important things – such as the audio and visuals – are not included. It’s best to keep that in mind.
We can obtain these rankings using a simple group-by on the data frame of episodes.
To add some excitement, let’s first look at how the episodes rank with respect to the mean sentiment score of the positively labeled sentences (MP) and the mean sentiment score of the negatively labeled sentences (MN). Let’s save the ranking with respect to the mean of the sentiment distribution (MA), which represents an overall ranking, for last.
Intensity of Netflix Black Mirror Positive Moments
OK. First, let’s look at how the episodes rank with respect to the mean sentiment score of the positively labeled sentences (MP). This ranking answers the question: How (positively) intense are the positive moments?
The complete ranking is:
Men Against Fire, the episode on population cleansing, and Hated in the Nation, the episode on killer bees, appear at the top of the ranking. This means that when these episodes are positive, they’re just mildly so. Thinking about it, it is not really that surprising to see these two episodes at the top since their storylines and characters are completely downbeat, with no real happy moments, as far as I can remember. The appearance of San Junipero at third place is quite surprising to me. After watching the episode, I felt that the it was really positive. Well, maybe the ending just made me feel that way?
Nosedive appearing at the bottom is really expected since its world, wherein people ‘rate’ each other according to perception, demands that citizens be superficially cheery. Lacie Pound, the main character of Nosedive, is the embodiment of sunshine from the start of the episode up until before the very end. Fifteen Million Merits makes an appearance at the second from the bottom. Because the episode revolves around a talent show, which is a lot of fun (until that moment with Abi!) and a love link between its two leads, it is justifiable why it appears near the bottom. The rank of Entire History of You is also expected since a large chunk of the episode is set at a lively dinner party.
Intensity of Netflix Black Mirror Negative Moments
Next, let’s look at at how the episodes rank with respect to the mean sentiment score of the negatively labeled sentences (MP). This ranking answers the question: How (negatively) intense are the negative moments?
The complete ranking is:
I completely agree with White Bear being at the top of the list. When the episode is upsetting, it is REALLY upsetting. Victoria Skillane, the protagonist, is a sympathetic monster. However, the punishment that she faces due to her crime is inhumane, and it reflects on society as well. Again, Men Against Fire appears at the top. Fifteen Million Merits appears at the third place. This is interesting, given that in the ranking for MP, it appears near the bottom. This means that when Fifteen Million Merits is positive, it’s really positive; when it’s negative, it’s really negative. Even though the show revolves around a fun talent show, it doesn’t take long before it shows its dark underbelly.
The bottom three are San Junipero, The Entire History of You, and Be Right Back. The presence of San Junipero at the bottom is justified, since there aren’t really any very upsetting or disturbing things that happened in the episode. The episode simply focuses on a love story that we can all get behind. The placing of The Entire History of You is quite interesting. I expected it to appear higher in the MN ranking. The scenes leading to the ending are quite upsetting. I only used the dialog of the show’s characters as the basis, maybe this isn’t captured? Be Right Back’s placing is understandable. The sad parts of the episode are mild compared to the other episodes.
Overall (Negative) Polarity of Netflix Black Mirror
Now, let’s look at the overall ranking – the ranking with respect to the mean of the sentiment distribution (MA):
The complete ranking is:
It is interesting to see that Hated in the Nation and Men Against Fire, the last two episodes of the series, appear at the top of the list and are the only ones that have a negative MA score. This means that, overall, these episodes have the most negative sentiment. Well, apparently, the writer of the show, Charlie Brooker, really knows how to end with a bang! It’s not really difficult to see why these two are at the top. They’re just really sad and negative episodes, plain and simple. The National Anthem appearing in third place is expected as well. Even though there are subtle comedic elements, the episode is pretty gray overall.
On the other hand, it is not really surprising to see Nosedive at the bottom of the list. Basically, the lead, Lacie Pound, is happiness incarnate. The Entire History of You in second place is quite unexpected. Playtest’s position in third isn’t questionable. The episode’s lead, Cooper, is a very cheery and outgoing man, and imparts a lot of positivity to the episode.
The only thing I didn’t really expect is for San Junipero to appear around the middle – I expected it to appear near the bottom. Overall, I found the episode to be positive. Again, maybe it’s just because of the ending?
Overall, the text sentiment analysis rankings I obtained are quite reasonable. I think VADER is a legit tool for text sentiment analysis at the sentence level. However, this exercise of sentiment analysis in Python is really, really basic – a lot of things can be done to improve it.
There are a lot of factors which affect the negativity of Netflix Black Mirror TV series that are not included here – primarily, the audio and the visuals. More complicated analytics tools, such as audio and image processing can be used in coordination with text analytics. However, I do not have these tools up in my arsenal as of this moment.
Originally, I wanted to do a ranking of how depressing each Netflix Black Mirror TV series episode is, but I realized that this is difficult to quantify. I do not really know of any metric to quantify this emotion. In lieu, I resorted to ranking the episodes according to how downbeat or negative they are.
It would also be interesting to see how the results would change with a different algorithm for text sentiment analysis. A future study can try to use Textblob, another tool for open-source sentiment analysis in Python.
A project that I intend to do is to look at the sentiment of each episode over time. As each episode – well, in Seasons 1 and 2 anyway – is divided into acts, it would be interesting to see the progression of sentiment across Charlie Brooker’s saga.