There Are 14 Billion Videos on YouTube

Until last month, nobody outside of YouTube had a solid estimate for just how many videos are currently on the site. Eight hundred million? One billion? It turns out that the figure is more like 14 billion—more than one and a half videos for every person on the planet—and that’s counting strictly those that are publicly visible.

I have that number not because YouTube maintains a public counter and not because the company issued a press release announcing it. I’m able to share it with you now only because I’m part of a small team of researchers at the University of Massachusetts at Amherst who spent a year figuring out how to calculate it. Our team’s paper, which was published last month, provides what we believe is the most comprehensive analysis of the world’s most important video-sharing platform to date. The viral videos and popular conspiracy theorists are, of course, important. But the reality is that the number and perhaps even importance of those videos is dwarfed by hours-long church services, condo-board meetings, and other miscellaneous clips that you’ll probably never see.

Unlike stereotypical YouTube videos—personality-driven and edited to engage the broadest possible audience—these videos aren’t uploaded with profit in mind. Instead, they illustrate some of the ways that people rely on YouTube for a much wider range of activities than you would find while casually scrolling through its algorithmically driven recommendations. YouTube may have started as a video platform, but it has since become the backbone of one of the 21st century’s core forms of communication.

Despite its global popularity, YouTube (which is owned by Google) veils its inner workings. When someone studies, for example, the proliferation of extreme speech on YouTube, they can tell us about a specific sample of videos—their content, view count, what other videos they link to, and so on. But that information exists in isolation; they cannot tell us how popular those videos are relative to the rest of YouTube. To make claims about YouTube in its entirety, we either need key information from YouTube’s databases, which isn’t realistic, or the ability to produce a big-enough, random sample of videos to represent the website.

That is what we did. We used a complicated process that boils down to making billions upon billions of random guesses at YouTube IDs (the identifiers you see in the URL when watching videos). We call it “dialing for videos,” inspired by the “random digit dialing” used in polling. It took a sophisticated cluster of powerful computers at the University of Massachusetts months to collect a representative sample; we then spent another few months analyzing those videos to paint what we think is the best portrait to date of YouTube as a whole. (We use a related, slightly faster method at this website to keep regularly updated data.)

So much of YouTube is effectively dark matter. Videos with 10,000 or more views account for nearly 94 percent of the site’s traffic overall but less than 4 percent of total uploads. Just under 5 percent of videos have no views at all, nearly three-quarters have no comments, and even more have no likes. Popularity is almost entirely algorithmic: We found little correlation between subscribers and views, reflecting how YouTube recommendations, and not subscriptions, are the primary drivers of traffic on the site. In other words, people tend to watch just a sliver of what YouTube has to offer, and, on the whole, they follow what the algorithm serves to them.

The much larger proportion of videos, not uplifted by the algorithm, is from what I call “Deep YouTube.” These videos sometimes resemble their more professional counterparts, but many—family slideshows, homework assignments—are apparently not intended for the “creator economy.” This is YouTube put to a different purpose and a different audience—an archive for friends and family, a public record, a tertiary channel for some other type of media.

Deep YouTube is almost impossible to fully grasp. Our paper sought to remedy that: It does not offer a single powerful takeaway but is instead meant to help researchers, journalists, and legislators contextualize other studies about YouTube. Gathering, analyzing, and publishing these data took us a year; if YouTube decided to share this kind of information, it may have taken us an afternoon. But YouTube, like Facebook and Instagram and X (formerly Twitter), has no mandate and little incentive to do so—we’ve come to accept that the most basic information about the platforms organizing our lives is unavailable. (Reached for comment about the issues raised by our research, a spokesperson for YouTube said, in part, “We’re always looking for new avenues to deepen transparency and expand collaborations with the research community, while ensuring the necessary privacy and security protection of our systems.”)

The word platform has become the most common way to reference YouTube and other popular websites, though it is flawed. Tarleton Gillespie, a media scholar, has argued that the term serves as strategic branding for tech companies to describe their products as egalitarian places where anyone with the inclination can speak. Even now, YouTube’s stated mission is “to give everyone a voice and show them the world.” The term also offers these companies some protection: Merely providing a platform implies that companies aren’t culpable for how people use it.

The framing has been broadly successful—YouTube brings in tens of billions of dollars in revenue each year and enjoys a legal status, under Section 230 of the Communications Decency Act, that distinguishes “providers” from “publishers” and effectively insulates sites from liability for the content their users upload. It took a wave of reporting and outrage about YouTube recommendations leading users toward extremist content to pressure YouTube to change those algorithms in 2019. We don’t know exactly what those changes were, but they appear to have been at least somewhat effective. Most of the time, though, companies give no indication that they’ve updated their recommendation algorithms.

Platforms present opportunities; they’re something you can choose to use in order to communicate. But for many people, YouTube is now less an opportunity than a requirement—something you have to use, because basic elements of society have organized around it. The terms with which YouTube’s trillion-dollar owner defines its product should no longer be our default. The website is infrastructure.

Changing the language we use to talk about YouTube allows a discussion based on its real place in the world. Perhaps when we talk about YouTube, instead of referencing MrBeast and Cocomelon, we should think about the December 18 meeting of the Amherst Board of Education. It provides a good example of YouTube used for recordkeeping, not for virality, with roughly 50 views at the time of writing. Those may be exactly the views it was intended to get. Indeed, if such a video goes viral, there’s a good chance something is terribly wrong.

When a platform becomes natural, transparent, and essential because of the ways people use it, it becomes infrastructure. Similarly, if large parts of society organize themselves around something such that a breakdown would have profound economic, social, and operational effects, it probably makes sense to use that term. Infrastructures are, after all, most visible when they stop working properly. Talking about YouTube as infrastructure doesn’t automatically mean, as many pundits recommend, treating it the same way we do public utilities such as gas and water, and doesn’t mean we must repeal Section 230. Rather, we have to think about a new set of principles for the site, free of partisan outrage.

An excellent example of YouTube as infrastructure, as my colleague Ethan Zuckerman has written about, comes from local journalists and activists: Because their personal sites are sometimes subject to hacking or takedown requests, they rely on YouTube to host video evidence of various human-rights violations. Google is a massive company with a global presence, extraordinary technical investiture, and a large legal team—it can resist takedown attacks that a personal website cannot.

YouTube is no less essential for viewers. Last year, the Washington Post editorial board published an opinion piece titled “YouTube Must Hold the Line in Russia,” which highlights how YouTube has become “the surest way for Russian citizens to keep abreast of what’s happening in the world.” Even after YouTube blocked Russian state-funded media and stopped selling ads in the country, even as other state censorship increased significantly and Google was issued fines for not complying with takedown requests, even after the late Yevgeny Prigozhin publicly asked for the site to be restricted, YouTube remains accessible in Russia. Perhaps it survives because too many citizens use YouTube every day for it to be banned entirely. Or maybe it survives because there is no suitable alternative in Russia for video hosting. When something becomes the last hope for obtaining information without a government filter, we are no longer talking about platforms where you might choose to speak. We are talking about fundamental infrastructure of human communication.

Up to now, in the absence of robust laws governing privacy, accountability, and reliability when it comes to internet-based infrastructure, we have relied on the tech industry to self-regulate. X, under Elon Musk, provides a timely cautionary tale about the abject failure of industry self-regulation. Since Musk’s takeover, overhauled content-moderation policies, reduced transparency, and hostility to researchers have degraded the website’s former function as a source of information and informed opinion. Self-regulation works, until someone in power simply decides to stop doing it.

The same might apply to YouTube. One of the tools researchers have used to study the site’s recommendations, “Related Videos,” was removed from the application programming interface in August. Although imperfect, the tool was one of the only ways for researchers to study connections between videos at scale. With large platforms restricting APIs, this is yet another impediment to studying the internet’s basic infrastructure, and yet another sign that we may be entering a dark age of internet research.

How to move forward from here? Recent laws in Europe provide one way, creating new vocabularies while implementing new rules. Although there’s still plenty of uncertainty about their implementation, the European Union’s Digital Services Act and Digital Markets Act provide an ambitious framework that will be worth watching. Google has already rolled out researcher programs applicable to YouTube, in part to comply with DSA requirements, but they are currently limited and have a lot of up-front restrictions. Still, these laws have the potential to spread regulatory responsibility around, bringing a range of expertise to our digital infrastructure for the good of the public—and of the sites themselves, which effectively receive free, independent quality assurance.

We expect the companies that manage our infrastructure to be reliable and subject to independent scrutiny. We might expect something similar from the default video arm of the internet: robust access for researchers, who should not fear lawsuits for doing their job; regular audits of recommendation algorithms; reports sharing information about systems we rely on every day. If we think about YouTube as essential infrastructure rather than as a platform, and shift our expectations accordingly, maybe we won’t have to spend the better part of a year figuring out how many videos it hosts. Maybe we’ll be able to spend our time working to ensure that this digital infrastructure serves the public interest for the next 14 billion videos to come.


This article was adapted from a blog post by Ryan McGrady that was previously published by the Media Ecosystems Analysis Group.

source site

Leave a Reply