Guides / Scrape Reddit
How to Scrape Reddit in 2026
Reddit uses rate limiting, anti-bot detection, and JavaScript rendering on its modern interface. This guide uses old.reddit.com, which provides a cleaner HTML structure that is easier to parse. With Browser7, you get fully rendered subreddit pages in a single API call.
What makes Reddit hard to scrape
Rate limiting and IP blocking
Reddit enforces strict rate limits on both its API and web interface. Datacenter IPs are quickly blocked, and even residential proxies can be throttled if request patterns appear automated.
JavaScript-heavy modern interface
The new Reddit interface (new.reddit.com) is a React single-page application that requires full JavaScript execution to render. Old Reddit provides server-rendered HTML, but still has anti-bot protections.
Login walls and interstitials
Reddit increasingly shows login prompts, app download banners, and cookie consent dialogs that can interfere with automated scraping. These overlays can hide the actual content you need.
Scrape Reddit subreddit posts
Browser7 handles proxy rotation, browser fingerprinting, CAPTCHA solving, and JavaScript rendering automatically. This example scrapes the r/webdev subreddit using old.reddit.com for cleaner HTML structure.
from browser7 import Browser7
client = Browser7(
api_key="b7_your_api_key",
base_url="https://ca-api.browser7.com/v1"
)
result = client.render(
"https://old.reddit.com/r/webdev/",
country_code="US",
)
print(result.html)That is the complete code. No proxy configuration, no browser setup, no CAPTCHA handling logic. The response contains the fully rendered HTML with post titles, scores, authors, and comment counts.
Data you can extract
The rendered HTML from old.reddit.com contains structured data that is straightforward to parse. Common data points to extract:
Post details
- Post title and full URL
- Link destination (for link posts)
- Self-text content
- Post flair and tags
- Media attachments
Engagement
- Upvote score
- Comment count
- Upvote ratio (when available)
- Award count
- Crosspost information
Author info
- Author username
- Author flair
- Post timestamp
- Distinguished status (mod, admin)
- Stickied/pinned status
Subreddit metadata
- Subscriber count
- Active users online
- Subreddit description
- Sorting mode (hot, new, top)
- Pagination tokens
Complete example: render and parse subreddit posts
Here is a complete example that renders a Reddit subreddit page and extracts structured data from the HTML. The Python example uses BeautifulSoup, Node.js uses Cheerio, and PHP uses DOMDocument - the standard HTML parsing approach for each language.
from browser7 import Browser7
from bs4 import BeautifulSoup
import json
client = Browser7(
api_key="b7_your_api_key",
base_url="https://ca-api.browser7.com/v1"
)
result = client.render(
"https://old.reddit.com/r/webdev/",
country_code="US",
)
soup = BeautifulSoup(result.html, "html.parser")
posts = []
for thing in soup.select("div.thing"):
post = {
"title": None,
"score": None,
"author": None,
"comments": None,
"date": None,
"url": None,
}
title_el = thing.select_one("a.title")
if title_el:
post["title"] = title_el.get_text(strip=True)
href = title_el.get("href", "")
post["url"] = f"https://old.reddit.com{href}" if href.startswith("/") else href
score = thing.select_one("div.score.unvoted")
if score:
post["score"] = score.get_text(strip=True)
author = thing.select_one("a.author")
if author:
post["author"] = author.get_text(strip=True)
comments = thing.select_one("a.comments")
if comments:
post["comments"] = comments.get_text(strip=True)
time_el = thing.select_one("time[datetime]")
if time_el:
post["date"] = time_el.get("datetime", "")[:10]
posts.append(post)
print(json.dumps(posts[:5], indent=2))CSS selectors may change if Reddit updates their page structure. Inspect the current page if any fields return null.
Sample output:
[
{
"title": "What's the most mass-produced mass of mass production?",
"score": "142",
"author": "webdev_enthusiast",
"comments": "43 comments",
"date": "2026-04-11",
"url": "https://old.reddit.com/r/webdev/comments/abc123/..."
},
{
"title": "Finally launched my side project after 2 years",
"score": "89",
"author": "shipping_dev",
"comments": "27 comments",
"date": "2026-04-10",
"url": "https://old.reddit.com/r/webdev/comments/def456/..."
},
...
]Scrape page 2 and beyond
Reddit uses the ?count=25&after= parameter for pagination. The after value is a post fullname (like t3_abc123) from the last post on the current page. You can find this value in the "next" link at the bottom of each page.
from browser7 import Browser7
from bs4 import BeautifulSoup
client = Browser7(
api_key="b7_your_api_key",
base_url="https://ca-api.browser7.com/v1"
)
# Page 2: Reddit uses ?count=25&after= with a post ID
result = client.render(
"https://old.reddit.com/r/webdev/?count=25&after=t3_abc123",
country_code="US",
)
soup = BeautifulSoup(result.html, "html.parser")
for thing in soup.select("div.thing"):
title = thing.select_one("a.title")
score = thing.select_one("div.score.unvoted")
if title and score:
print(f"{score.get_text(strip=True)} - {title.get_text(strip=True)}")Take a screenshot of subreddit posts
Capture Reddit subreddit pages as images for social media monitoring, community analysis, or tracking trending discussions over time.
import base64
from browser7 import Browser7
client = Browser7(
api_key="b7_your_api_key",
base_url="https://ca-api.browser7.com/v1"
)
result = client.render(
"https://old.reddit.com/r/webdev/",
country_code="US",
block_images=False,
include_screenshot=True,
screenshot_full_page=True,
screenshot_format="png"
)
# Save the screenshot
with open("reddit-webdev.png", "wb") as f:
f.write(base64.b64decode(result.screenshot))
print("Screenshot saved")What this costs
Every Reddit page render costs $0.01 - the same as any other website. Residential proxies, JavaScript rendering, CAPTCHA solving, and screenshots are all included. There are no per-domain surcharges, no credit multipliers, and no bandwidth fees.
10,000 Reddit pages costs $100. You know this before you start, not after.