Reddit, the self-anointed “front page of the internet,” sits atop a huge archive of original content. It contains more than a billion posts created by its 73 million average daily unique users self-organized into more than 100,000 interest-based communities, or subreddits, ranging from sports to politics, technology, pets, movies music & TV, health & nutrition, business, philosophy and home & garden. You name it, there’s likely to be a subreddit for it.
The scale and diversity of the Reddit archive, replete with uncounted links to all corners of the World Wide Web and made freely accessible via API, has long-been a highly valued resource for researchers, academics and developers building third-party applications for accessing Reddit communities. More recently, it has also eagerly been mined by developers of generative AI tools in need of large troves of natural language texts on which to train their models.