Guides / Scrape Reddit

How to Scrape Reddit in 2026

Reddit uses rate limiting, anti-bot detection, and JavaScript rendering on its modern interface. This guide uses old.reddit.com, which provides a cleaner HTML structure that is easier to parse. With Browser7, you get fully rendered subreddit pages in a single API call.

What makes Reddit hard to scrape

Rate limiting and IP blocking

Reddit enforces strict rate limits on both its API and web interface. Datacenter IPs are quickly blocked, and even residential proxies can be throttled if request patterns appear automated.

JavaScript-heavy modern interface

The new Reddit interface (new.reddit.com) is a React single-page application that requires full JavaScript execution to render. Old Reddit provides server-rendered HTML, but still has anti-bot protections.

Login walls and interstitials

Reddit increasingly shows login prompts, app download banners, and cookie consent dialogs that can interfere with automated scraping. These overlays can hide the actual content you need.

Scrape Reddit subreddit posts

Browser7 handles proxy rotation, browser fingerprinting, CAPTCHA solving, and JavaScript rendering automatically. This example scrapes the r/webdev subreddit using old.reddit.com for cleaner HTML structure.

from browser7 import Browser7

client = Browser7(
    api_key="b7_your_api_key",
    base_url="https://ca-api.browser7.com/v1"
)

result = client.render(
    "https://old.reddit.com/r/webdev/",
    country_code="US",
)

print(result.html)

That is the complete code. No proxy configuration, no browser setup, no CAPTCHA handling logic. The response contains the fully rendered HTML with post titles, scores, authors, and comment counts.

Data you can extract

The rendered HTML from old.reddit.com contains structured data that is straightforward to parse. Common data points to extract:

Post details

Post title and full URL
Link destination (for link posts)
Self-text content
Post flair and tags
Media attachments

Engagement

Upvote score
Comment count
Upvote ratio (when available)
Award count
Crosspost information

Author info

Author username
Author flair
Post timestamp
Distinguished status (mod, admin)
Stickied/pinned status

Subreddit metadata

Subscriber count
Active users online
Subreddit description
Sorting mode (hot, new, top)
Pagination tokens

Complete example: render and parse subreddit posts

Here is a complete example that renders a Reddit subreddit page and extracts structured data from the HTML. The Python example uses BeautifulSoup, Node.js uses Cheerio, and PHP uses DOMDocument - the standard HTML parsing approach for each language.

from browser7 import Browser7
from bs4 import BeautifulSoup
import json

client = Browser7(
    api_key="b7_your_api_key",
    base_url="https://ca-api.browser7.com/v1"
)

result = client.render(
    "https://old.reddit.com/r/webdev/",
    country_code="US",
)

soup = BeautifulSoup(result.html, "html.parser")

posts = []
for thing in soup.select("div.thing"):
    post = {
        "title": None,
        "score": None,
        "author": None,
        "comments": None,
        "date": None,
        "url": None,
    }

    title_el = thing.select_one("a.title")
    if title_el:
        post["title"] = title_el.get_text(strip=True)
        href = title_el.get("href", "")
        post["url"] = f"https://old.reddit.com{href}" if href.startswith("/") else href

    score = thing.select_one("div.score.unvoted")
    if score:
        post["score"] = score.get_text(strip=True)

    author = thing.select_one("a.author")
    if author:
        post["author"] = author.get_text(strip=True)

    comments = thing.select_one("a.comments")
    if comments:
        post["comments"] = comments.get_text(strip=True)

    time_el = thing.select_one("time[datetime]")
    if time_el:
        post["date"] = time_el.get("datetime", "")[:10]

    posts.append(post)

print(json.dumps(posts[:5], indent=2))

CSS selectors may change if Reddit updates their page structure. Inspect the current page if any fields return null.

Sample output:

[
  {
    "title": "What's the most mass-produced mass of mass production?",
    "score": "142",
    "author": "webdev_enthusiast",
    "comments": "43 comments",
    "date": "2026-04-11",
    "url": "https://old.reddit.com/r/webdev/comments/abc123/..."
  },
  {
    "title": "Finally launched my side project after 2 years",
    "score": "89",
    "author": "shipping_dev",
    "comments": "27 comments",
    "date": "2026-04-10",
    "url": "https://old.reddit.com/r/webdev/comments/def456/..."
  },
  ...
]

Scrape page 2 and beyond

Reddit uses the ?count=25&after= parameter for pagination. The after value is a post fullname (like t3_abc123) from the last post on the current page. You can find this value in the "next" link at the bottom of each page.

from browser7 import Browser7
from bs4 import BeautifulSoup

client = Browser7(
    api_key="b7_your_api_key",
    base_url="https://ca-api.browser7.com/v1"
)

# Page 2: Reddit uses ?count=25&after= with a post ID
result = client.render(
    "https://old.reddit.com/r/webdev/?count=25&after=t3_abc123",
    country_code="US",
)

soup = BeautifulSoup(result.html, "html.parser")
for thing in soup.select("div.thing"):
    title = thing.select_one("a.title")
    score = thing.select_one("div.score.unvoted")
    if title and score:
        print(f"{score.get_text(strip=True)} - {title.get_text(strip=True)}")

Take a screenshot of subreddit posts

Capture Reddit subreddit pages as images for social media monitoring, community analysis, or tracking trending discussions over time.

import base64
from browser7 import Browser7

client = Browser7(
    api_key="b7_your_api_key",
    base_url="https://ca-api.browser7.com/v1"
)

result = client.render(
    "https://old.reddit.com/r/webdev/",
    country_code="US",
    block_images=False,
    include_screenshot=True,
    screenshot_full_page=True,
    screenshot_format="png"
)

# Save the screenshot
with open("reddit-webdev.png", "wb") as f:
    f.write(base64.b64decode(result.screenshot))

print("Screenshot saved")

What this costs

Every Reddit page render costs $0.01 - the same as any other website. Residential proxies, JavaScript rendering, CAPTCHA solving, and screenshots are all included. There are no per-domain surcharges, no credit multipliers, and no bandwidth fees.

10,000 Reddit pages costs $100. You know this before you start, not after.

Try it yourself

100 free renders - enough to test Reddit scraping with no payment required.