GoProxy > Blog > Use Cases > Master JavaScript Web Scraping with Node.js & Residential Proxies

Master JavaScript Web Scraping with Node.js & Residential Proxies

Post Time: 2025-04-17 Update Time: 2025-04-17

Learn how to scrape dynamic JavaScript websites using Node.js, Puppeteer, Playwright, and GoProxy’s residential proxies. Step-by-step tutorials included.

A scalable JavaScript web scraper must fetch static HTML, execute client-side code, and dodge IP blocks, rate limits, and geo-restrictions. This guide delivers three hands-on methods—Axios + Cheerio for server-rendered pages, Puppeteer/Playwright for dynamic sites, and GoProxy’s API for managed scraping—each integrated with residential proxies for seamless IP rotation, geo-targeting, and session control. You’ll find clear setup steps, code examples, tables comparing tools, best practices, and an FAQ list to address real-world scraping scenarios.

scraping dynamic websites using JavaScript

What You’ll Build

This tutorial covers:

Static Scraping with Axios + Cheerio to extract data from server-rendered HTML.
Dynamic Scraping with Puppeteer & Playwright to automate browsers and capture client‑side content.
Managed Scraping via GoProxy API for enterprise-scale projects with minimal code.

By the end, you’ll have working scripts that fetch product listings, handle pagination, perform infinite scroll, and submit jobs to GoProxy’s service—all through residential proxies to avoid detection and maximize reliability.

Prerequisites & Setup

Make sure you have:

Node.js v18+ installed (includes native fetch)
npm or Yarn for package management
A GoProxy account with proxy credentials (GOPROXY_USER, GOPROXY_PASS, GOPROXY_HOST, GOPROXY_PORT, GOPROXY_API_KEY)
A basic understanding of JavaScript/Node.js and a code editor (e.g., VS Code)

Project initialization:

bash

mkdir js-scraper && cd js-scraper

npm init -y

npm install axios cheerio puppeteer playwright dotenv

Create a .env file:

ini

GOPROXY_USER=your_user

GOPROXY_PASS=your_pass

GOPROXY_HOST=proxy.goproxy.com

GOPROXY_PORT=8000

GOPROXY_API_KEY=your_api_key

Load with require('dotenv').config() in your scripts.

Core Concepts Recap

1. Node.js Event Loop

Node.js runs JavaScript on a single thread via an event loop that offloads I/O to the system kernel, allowing non‑blocking operations and efficient concurrency.

2. Async/Await & Promises

Use async/await to pause execution until a promise resolves—crucial for sequential scraping tasks. Forgetting await can lead to unfulfilled network calls and empty data.

3. Residential Proxies

Residential proxies route requests through real-user IPs, reducing blocks and enabling geo-targeting. Choose rotating sessions (new IP per request) or sticky sessions (same IP for multi-step flows).

Choosing the Right Tools

HTTP Client Comparison

Efficient HTTP clients are necessary for static scraping. Here’s how three popular options compare:

Feature	Fetch API	Axios	SuperAgent
Built-in	Yes (Node v18+)	No	No
JSON auto-parse	No	Yes	No
Interceptors	No	Yes	No
Cancellation support	Experimental	Yes	Yes
Proxy integration	Environment vars	Built-in	Plugin
Ease of use	Moderate	High	Moderate

Fetch API: Global in Node 18+, simple but lacks advanced hooks.
Axios: Offers request/response interceptors, automatic JSON transforms, and cancellation tokens—ideal for proxy setups.
SuperAgent: Stream‑based, chaining API, best for large payloads.

Editor’s Recommendation: Start with Axios for a balance of power and simplicity.

DOM Parsing: Cheerio vs. jsdom

After fetching HTML, choose the right parser:

Cheerio: Fast, jQuery-like API for server-side HTML traversal. Doesn’t execute JavaScript. Use for simple extraction tasks.
jsdom: Full DOM emulation with CSSOM and HTML parsing. Slower and heavier—use when you need true browser APIs (e.g., document.createElement).

Example (Cheerio):

const cheerio = require('cheerio');

const $ = cheerio.load(html);

const titles = $('h2.title').map((i, el) => $(el).text()).get();

Headless Browser Options

Use browser automation when your page relies on client-side JavaScript.

Feature	Puppeteer	Playwright
Browser support	Chromium, Firefox	Chromium, Firefox, WebKit
Auto-wait	No (manual)	Yes
Parallel contexts	Limited	Multiple
Test runner	No	@playwright/test
Ease of setup	High	Moderate

Puppeteer: Controls browsers via DevTools Protocol; supports screenshots, PDFs.
Playwright: Adds auto‑waiting and multi‑browser support out of the box.

Editor’s Recommendation: Use Playwright for cross‑browser needs. And use Puppeteer for quick Chrome‑only automation.

Method 1: Static Scraping with Axios & Cheerio

Use when: Pages serve data in initial HTML without requiring JavaScript.

1. Install & Boilerplate

bash

npm install axios cheerio dotenv

Create scrape-cheerio.js:

require('dotenv').config();

const axios = require('axios');

const cheerio = require('cheerio');

const proxy = {

host: process.env.GOPROXY_HOST,

port: +process.env.GOPROXY_PORT,

auth: { username: process.env.GOPROXY_USER, password: process.env.GOPROXY_PASS }

};

async function fetchPage(url) {

const { data } = await axios.get(url, { proxy, timeout: 10000 });

return data;

}

function parseItems(html) {

const $ = cheerio.load(html);

return $('.item').map((i, el) => ({

title: $(el).find('.title').text().trim(),

price: $(el).find('.price').text().trim()

})).get();

}

(async () => {

try {

const html = await fetchPage('https://example.com/products');

console.log(parseItems(html));

} catch (e) {

console.error('Error:', e.message);

}

})();

Cheerio implements a subset of jQuery for fast DOM parsing (no JS execution).
Axios offers interceptors, JSON transforms, and built‑in proxy support.

2. Pagination Handling
js

async function scrapeAll(url) {

let next = url, results = [];

while (next) {

const html = await fetchPage(next);

results.push(...parseItems(html));

const $ = cheerio.load(html);

next = $('.next-page').attr('href') || null;

}

return results;

}

Common Pitfalls

Timeouts: Increase the timeout or verify proxy connectivity.

Empty arrays: Check your CSS selectors against the site’s HTML.

Method 2: Dynamic Scraping with Puppeteer & Playwright

Use when: Pages render content via JavaScript (SPAs, infinite scroll).

Why Headless Browsers?

They execute JS exactly like real users, enabling data capture from client-side-rendered pages.

Puppeteer Example

require('dotenv').config();

const puppeteer = require('puppeteer');

(async () => {

const browser = await puppeteer.launch({

headless: true,

args: [`--proxy-server=${process.env.GOPROXY_HOST}:${process.env.GOPROXY_PORT}`]

});

const page = await browser.newPage();

await page.authenticate({

username: process.env.GOPROXY_USER,

password: process.env.GOPROXY_PASS

});

await page.goto('https://example.com/dynamic', { waitUntil: 'networkidle2' });

let prevHeight;

do {

prevHeight = await page.evaluate('document.body.scrollHeight');

await page.evaluate('window.scrollTo(0, prevHeight)');

await page.waitForTimeout(1000);

} while ((await page.evaluate('document.body.scrollHeight')) > prevHeight);

const data = await page.evaluate(() =>

Array.from(document.querySelectorAll('.item')).map(el => ({

title: el.querySelector('.title')?.innerText,

price: el.querySelector('.price')?.innerText

}))

);

console.log(data);

await browser.close();

})();

networkidle2 waits for no more than 2 network connections for ≥500 ms.
Use page.screenshot() to debug selector issues.

Playwright Example

require('dotenv').config();

const { chromium } = require('playwright');

(async () => {

const browser = await chromium.launch({

proxy: { server: `${process.env.GOPROXY_HOST}:${process.env.GOPROXY_PORT}`,

username: process.env.GOPROXY_USER,

password: process.env.GOPROXY_PASS }

});

const page = await browser.newPage();

await page.goto('https://example.com/dynamic', { waitUntil: 'networkidle' });

const items = await page.$$eval('.item', els =>

els.map(el => ({ title: el.querySelector('.title').innerText }))

);

console.log(items);

await browser.close();

})();

Playwright’s auto-waits reduce the need for manual timeouts.

Common Pitfalls

Proxy Errors: Verify .env credentials.

Missing Data: Use page.screenshot() to debug rendering.

Method 3: Managed Scraping with GoProxy API

Use when: Minimal code and infrastructure—ideal for large-scale or multi-site jobs.

Code Example

require('dotenv').config();

const axios = require('axios');

async function submitJob(url, selectors) {

const res = await axios.post('https://api.goproxy.com/scraping/jobs',

{ url, selectors },

{ headers: { 'Authorization': `Bearer ${process.env.GOPROXY_API_KEY}` } }

);

return res.data.jobId;

}

async function fetchResults(jobId) {

const res = await axios.get(`https://api.goproxy.com/scraping/jobs/${jobId}`,

{ headers: { 'Authorization': `Bearer ${process.env.GOPROXY_API_KEY}` } }

);

return res.data;

}

(async () => {

const selectors = [{ name: 'title', path: '.item .title' }, { name: 'price', path: '.item .price' }];

const jobId = await submitJob('https://example.com/products', selectors);

console.log('Job ID:', jobId);

let result;

do {

await new Promise(r => setTimeout(r, 2000));

result = await fetchResults(jobId);

} while (result.status !== 'completed');

console.log(result.data);

})();

GoProxy handles proxy rotation, retries, and structured JSON output out of the box.

Best Practices & Troubleshooting

1. Proxy Strategies

Rotating sessions: New IP per request for breadth-first crawls.

Sticky sessions: Same IP for multi-step interactions (login flows).

Geo-targeting: Use GoProxy’s dashboard to select countries or cities; verify via https://ipinfo.io/json.

2. Rate Limits & Backoff

Use exponential backoff (1s → 2s → 4s) on HTTP 429/503 errors. Insert random delays (2–5 s) between actions to mimic human behavior.

3. CAPTCHA & Bot Defenses

Switch between headless and headful modes. Integrate a CAPTCHA solver for high‑security sites.

4. Logging & Monitoring

Log proxy ID, latency, status codes. Use GoProxy webhooks for error alerts and automated fallback.

Method Comparison at a Glance

Method	Dynamic JS	Setup Complexity	Speed	Best Use Case
Axios + Cheerio	No	Low	Fast	Static pages, bulk data extraction
Puppeteer	Yes	Medium	Moderate	Interactive SPAs, infinite scroll
Playwright	Yes	Medium	Moderate	Cross-browser scenarios
GoProxy API	Yes	Very Low	High	Enterprise-scale, low-dev overhead

FAQs

1. Which method is best for beginners?

Start with Axios + Cheerio for minimal setup and fast static‑HTML scraping.

2. How many proxies do I need?

A pool of 5–10 rotating residential IPs typically handles hundreds of pages per hour.

3. Can proxies eliminate all CAPTCHA challenges?

Rotation reduces CAPTCHA triggers but doesn’t guarantee avoidance. Add a CAPTCHA‑solving service for full coverage.

4. What’s the difference between rotating and sticky sessions?

Rotating: New IP each request—ideal for breadth‑first data collection.

Sticky: One IP per session—necessary for login flows and checkout processes.

5. How do I verify proxy geo‑location?

Call https://ipinfo.io/json through each proxy; inspect the country and city fields.

Final Thoughts

Web scraping is an invaluable skill for gathering data from the internet, but it’s not without its hurdles, especially when tackling dynamic, JavaScript-heavy websites or navigating anti-scraping protections. In this blog, we’ve walked through three powerful methods for web scraping with Node.js. Each approach has its strengths, and the best choice depends on your project’s needs.

But no matter the method, one thing remains constant: the need for reliable, undetectable proxies to bypass blocks, manage geo-restrictions, and keep your scraping running smoothly.

With over 90 million rotating IPs sourced from real residential devices, GoProxy delivers the anonymity and flexibility you need to scrape successfully. Whether you’re a beginner testing the waters or a pro scaling up your operations, our residential proxies integrate seamlessly into your Node.js workflows, offering both rotating and sticky sessions to suit your needs.

goproxy rotating residential proxies

We’d love to invite you to experience GoProxy’s residential proxies and web scraping service. See firsthand how easy it is to set up, how reliable our IPs are, and how they can simplify even the toughest scraping challenges. Sign up today to get a free trial and 7*24 technical support!

< Previous

CAPTCHAs Made Simple: How They Work, Solving Tips, and Proxy Tricks

Next >

Don't Miss Coachella 2025: Grab Last-Minute Ticket with Proxies

Start Your 7-Day Free Trial Now!

Cancel anytime

No credit card required

Master JavaScript Web Scraping with Node.js & Residential Proxies

What You’ll Build

Prerequisites & Setup

Core Concepts Recap

1. Node.js Event Loop

2. Async/Await & Promises

3. Residential Proxies

Choosing the Right Tools

HTTP Client Comparison

DOM Parsing: Cheerio vs. jsdom

Headless Browser Options

Method 1: Static Scraping with Axios & Cheerio

1. Install & Boilerplate

2. Pagination Handlingjs

Common Pitfalls

Method 2: Dynamic Scraping with Puppeteer & Playwright

Why Headless Browsers?

Puppeteer Example

Playwright Example

Common Pitfalls

Method 3: Managed Scraping with GoProxy API

Code Example

Best Practices & Troubleshooting

1. Proxy Strategies

2. Rate Limits & Backoff

3. CAPTCHA & Bot Defenses

4. Logging & Monitoring

Method Comparison at a Glance

Further Reading

FAQs

1. Which method is best for beginners?

2. How many proxies do I need?

3. Can proxies eliminate all CAPTCHA challenges?

4. What’s the difference between rotating and sticky sessions?

5. How do I verify proxy geo‑location?

Final Thoughts

2. Pagination Handling
js