Web Scraping / Dynamic Content
How to Scrape Dynamic Content and SPAs
You render a page with a headless browser, but the data you need is not there. The page skeleton loads, then the actual content appears a moment later via AJAX calls. Single-page applications, lazy-loaded content, and client-side data fetching all create the same problem: the HTML you get is not the HTML you see. Here is how to get the content you actually need.
Why dynamic content is tricky to scrape
Content loads in stages
Modern websites do not render all at once. The page framework loads first, then JavaScript fetches data from APIs and injects it into the page. A job board might show the search form immediately but load the actual job listings 500 milliseconds later. If you grab the HTML too early, you get the empty container without the data.
No universal "done" signal
With a static HTML page, the content is ready when the page loads. With a dynamic page, there is no reliable way to know when "all content has finished loading." Different data loads at different times, some content only loads on scroll, and the page might continue making API calls indefinitely. You need to define what "done" means for your specific use case.
Data comes from internal APIs, not the HTML
Some SPAs fetch data from internal APIs and render it entirely client-side. The HTML contains a JavaScript bundle and empty container divs. The actual data lives in JSON responses from API endpoints that the JavaScript calls after the page loads. Sometimes the cleanest approach is to call those APIs directly rather than parsing the rendered HTML.
Wait for specific elements to appear
The most common approach: tell Browser7 to wait until a specific element appears in the page before returning the HTML. Use the wait_for_selector parameter with a CSS selector that matches the content you need.
from browser7 import Browser7
client = Browser7(api_key="b7_your_api_key")
# Wait for job cards to appear before returning HTML
result = client.render(
"https://www.indeed.com/jobs?q=software+engineer&l=Remote",
wait_for_selector="#mosaic-provider-jobcards",
country_code="US",
)
print(result.html)Browser7 renders the page, waits for JavaScript to execute, and checks for the specified element. Once the element appears in the DOM, Browser7 returns the complete HTML with all dynamically loaded content included.
To find the right selector, open the page in your browser, wait for the content you need to appear, then inspect it to find a reliable CSS selector. IDs and data attributes like data-testid are the most stable choices.
Fetch internal API data directly
Some SPAs load data from internal APIs that return clean JSON - much easier to work with than parsed HTML. Browser7's fetch_urls parameter lets you make requests to these APIs using the same session cookies and headers that the browser established when rendering the page.
from browser7 import Browser7
client = Browser7(api_key="b7_your_api_key")
# Render the page, then make an API call using the established session
result = client.render(
"https://www.example.com/products",
country_code="US",
fetch_urls=["https://www.example.com/api/products?page=2"],
)
# result.html has the rendered page
# result.fetch_results has the API response data
print(result.html)
for fetch in result.fetch_results:
print(fetch.url, len(fetch.body))This is particularly useful when a site's internal API returns structured JSON that is cleaner and more reliable than parsing the rendered HTML. The fetch requests use the same authenticated session the browser created, so they have access to the same data the page's JavaScript would see.
Tips for scraping dynamic pages
Use stable selectors
Prefer IDs, data attributes (data-testid, data-product-card), and semantic selectors over class names. CSS classes generated by frameworks like Tailwind or CSS Modules change between builds. IDs and data attributes are typically stable.
Check for API endpoints first
Open your browser's Network tab while loading the target page. Look for XHR/Fetch requests that return JSON. If the data you need is available as a clean API response, use fetch_urls to get it directly instead of parsing HTML.
Wait for content, not for time
Waiting for a specific element is more reliable than waiting a fixed number of seconds. Content might load in 200ms on a good day and 3 seconds on a slow day. A selector-based wait handles both cases. A fixed delay either wastes time or misses the content.
What this costs
Dynamic content rendering, wait conditions, and fetch requests are all included in the standard $0.01 per page price. There is no extra charge for waiting longer, no surcharge for SPA rendering, and no additional cost for fetch requests made within the same render.
See it in practice
These guides scrape dynamically loaded content:
- How to Scrape Indeed - job cards loaded dynamically via JavaScript
- How to Scrape YouTube - full SPA with dynamic video listings
- How to Scrape Twitter/X - React SPA with client-side data loading