Digging deep into the YouTube algorithm
Who hasn’t come across an absurd YouTube recommendation? You’re watching CrashCourse Philosophy and ending up on a conspiracy about Donald Trump being a lizard. Sometimes videos seem completely irrelevant and there exists a whole channel on Google Supportfor user problems.
YouTube is so powerful that even young children are obsessedwith it. The platform’s recommendations section is constantly trying to find what we would like to watch nextwhilstscanning massive amounts of personal data. With great power comes great responsibility, and we better think twice before blindly trusting the platform.
Trusting the algorithm?
YouTube profiles are designed for crafting and personalisation, using affordances as subscribing, upvoting, creating lists. Then, AI scans user activity, likes, dislikes, previously viewed videos… and all other sorts of personal informationlike phone number, home and work address, recently visited places and suggests potentially likeable video content. YouTube uses this information as a “baseline”and builds up recommendations linked to users’ viewing history.
In 2016, Google publishedan official paper on the deep learning processes embedded in the YouTube algorithm. The algorithm, they write, combines gathered data based on factors such as scale, freshness, and noise – features linked to viewership, the constant flow of new videos, and previous content liked by the user. They provide analysis of the computation processes, but they still cannot explain the glitches commonly found in the system – for instance, why is the algorithm always pushing towards extremes?
The dark side of the algorithm
Adapting to one’s preferences might be useful, but it seems like YouTube is prompting radicalism, as if you are never “hard core” enough for it.
Guillaume Chaslot, founder of AlgoTransparency– a project aiming towards web transparency of data, claims that recommendations are in fact pointless,they are designed to waste your time on YouTube, increasing your time-view. Chances are – you will either get hooked onto the platform or will end up clicking on one of the ads, thus generating revenue. Chaslot says that the algorithm’s goal is to increase your watch time, or in other words – time spent on the platform, and doesn’t necessarily follow user preferences.
It seems like YouTube’s algorithms are promoting whatever is both viral and engaging, and are using wild claims and hate speechin the process. Perhaps this is why the platform has been targeted by multiple extremist and conspiracy theory channels. However, it is important to acknowledge that YouTube has taken measures against that problem.
Inspired by recentresearchon this topic, we conducted our own expedition down the YouTube rabbit hole. The project aims to examine the YouTube recommendation algorithm, so we started with a simple YouTube search on ‘Jeremy Corbin’ and ‘anti-Semitism’. The topic is completely random and provoked solely by the fact that we are London residents familiar with the news. For clarity’s sake, here is a visual representation of the data (Figure1.0).
On Figure 1.0, we can see the network formed by all videos related to the key terms which will end up in the recommendations section. The network has 1803 nodes and 38 732 edges, each of them representing political videos on current global events and how they relate to one another.
Alongside with the expected titles including key words such as ‘Jeremy Corbin’, ‘Theresa May’, ‘Hebrew’, ‘Jewish’, one may notice a miniature cluster far on the left-hand side. It has three components, or YouTube videos, that are, least to say, hilarious. Let’s zoom in.
At a first glance, they seem completely random and are positioned furthest of the network and are unrelated to whether Jeremy Corbyn is an anti-Semite or not. So, there must be something hidden in the underlying meaning of the videos which makes them somehow relatable. I will refer to the videos in this cluster as ‘random’, however, in the following lines, the reader will be persuaded in the lack of any randomness whatsoever.
The three videos (Figure 3.0) have a vivid variation in content: from a teenage girl that bought Justin Bieber’s old iPhone filledwith R Rated personal material; through a woman who got pregnant by her boyfriend’s grandpa; all the way to the story of a daughter who tried to surprise her mother in jail only to end up in prison not being able to recognise her own mother who had gone through plastic surgery to become a secret spy (???).
It is easy to spot the production similarities between the three ‘random’ videos, nevertheless, they would usually not appear in the same context as they have different topics, keywords, and are produced by different channels. All videos are animated and have a cartoon protagonist that guides the viewers through their supposedly fascinating life story, and all seems made-up. The creators produrces used visual effects to affect human perceptions – animation, fast-moving transitions, exciting background music.
The ‘random’ videos and some commentary. Snapshots: YouTube. Edit is done by the author.
Caricature is the artists’ way of presenting personal opinion on a more radical case. It’s therefore understandable why caricatures often include political figures and international affairs. Further, humour renders the brutality of life easier to handle. Animation has become a tool for distribution and reproduction and is associated with conditions of conflict, both national and international. Since the foregoing videos are associated with extremes, YouTube algorithm suggests what it finds extreme – apparently ‘Jeremy Corbyn’ and ‘anti-Semitism’.
After observing the visual part of the content, we moved on to linguistic and semantic investigation. It is found that words as ‘scandal’, ‘very important people’, ‘controversial situations’, ‘jail’, ‘accusations’ might be the reason why those videos appear in the network related to the key words ‘Jeremy Corbyn’ and ‘Anti-Semitism’.
Interestingly, all three comment sections in the ‘random’ cluster are filled with jokes and general opinion of the videos being fake. Very little of the public believes in the validity of the stories. If we browse comments from nodes with political videos, we can find similar language. That proves that AI not only scans language but detects opinion and irony and links common themes together.
The reader now understands why my area of interest is focused on this particular cluster, as it is a metaphorical representation of the whole network. Eventually, the research proved that Jeremy Corbyn is not perceived as an anti-Semitist by the online public (or by the algorithm).
What is the algorithm suggesting?
To get a better grasp of the common assets in the network, we observe the nodes that are closest to the ‘random’ cluster (Figure 2.0). Following logical conclusions, can we say that the algorithm suggests all those political events are either a scam or a mockery? As the three videos are linked in a network with other, definitely not-so-humorous videos, this means they share keywords, topics, creators, or audience. The algorithm appears to find a similarity between the absurdity of the animated YouTube videos and the nodes closest to the cluster. Could this be the algorithm manifesting its opinion?
Of course, these are all speculations, and factors such as viewership and watch time are not to be neglected. As both viewers and producers, we should also remember that content may be interpreted differentlyin diverse social groups.