Unblock Games at School: A Step-by-Step Proxy Guide
Unblock games at school with proxies! Learn HTTP, SOCKS5, and WebSocket tricks with GoProxy.
Apr 25, 2025
Learn how to scrape dynamic JavaScript websites using Node.js, Puppeteer, Playwright, and GoProxy’s residential proxies. Step-by-step tutorials included.
A scalable JavaScript web scraper must fetch static HTML, execute client-side code, and dodge IP blocks, rate limits, and geo-restrictions. This guide delivers three hands-on methods—Axios + Cheerio for server-rendered pages, Puppeteer/Playwright for dynamic sites, and GoProxy’s API for managed scraping—each integrated with residential proxies for seamless IP rotation, geo-targeting, and session control. You’ll find clear setup steps, code examples, tables comparing tools, best practices, and an FAQ list to address real-world scraping scenarios.
This tutorial covers:
By the end, you’ll have working scripts that fetch product listings, handle pagination, perform infinite scroll, and submit jobs to GoProxy’s service—all through residential proxies to avoid detection and maximize reliability.
Make sure you have:
Project initialization:
bash
mkdir js-scraper && cd js-scraper
npm init -y
npm install axios cheerio puppeteer playwright dotenv
Create a .env file:
ini
GOPROXY_USER=your_user
GOPROXY_PASS=your_pass
GOPROXY_HOST=proxy.goproxy.com
GOPROXY_PORT=8000
GOPROXY_API_KEY=your_api_key
Load with require('dotenv').config() in your scripts.
Node.js runs JavaScript on a single thread via an event loop that offloads I/O to the system kernel, allowing non‑blocking operations and efficient concurrency.
Use async/await to pause execution until a promise resolves—crucial for sequential scraping tasks. Forgetting await can lead to unfulfilled network calls and empty data.
Residential proxies route requests through real-user IPs, reducing blocks and enabling geo-targeting. Choose rotating sessions (new IP per request) or sticky sessions (same IP for multi-step flows).
Efficient HTTP clients are necessary for static scraping. Here’s how three popular options compare:
Feature | Fetch API | Axios | SuperAgent |
Built-in | Yes (Node v18+) | No | No |
JSON auto-parse | No | Yes | No |
Interceptors | No | Yes | No |
Cancellation support | Experimental | Yes | Yes |
Proxy integration | Environment vars | Built-in | Plugin |
Ease of use | Moderate | High | Moderate |
Editor’s Recommendation: Start with Axios for a balance of power and simplicity.
After fetching HTML, choose the right parser:
Example (Cheerio):
js
const cheerio = require('cheerio');
const $ = cheerio.load(html);
const titles = $('h2.title').map((i, el) => $(el).text()).get();
Use browser automation when your page relies on client-side JavaScript.
Feature | Puppeteer | Playwright |
Browser support | Chromium, Firefox | Chromium, Firefox, WebKit |
Auto-wait | No (manual) | Yes |
Parallel contexts | Limited | Multiple |
Test runner | No | @playwright/test |
Ease of setup | High | Moderate |
Editor’s Recommendation: Use Playwright for cross‑browser needs. And use Puppeteer for quick Chrome‑only automation.
Use when: Pages serve data in initial HTML without requiring JavaScript.
bash
npm install axios cheerio dotenv
Create scrape-cheerio.js:
js
require('dotenv').config();
const axios = require('axios');
const cheerio = require('cheerio');
const proxy = {
host: process.env.GOPROXY_HOST,
port: +process.env.GOPROXY_PORT,
auth: { username: process.env.GOPROXY_USER, password: process.env.GOPROXY_PASS }
};
async function fetchPage(url) {
const { data } = await axios.get(url, { proxy, timeout: 10000 });
return data;
}
function parseItems(html) {
const $ = cheerio.load(html);
return $('.item').map((i, el) => ({
title: $(el).find('.title').text().trim(),
price: $(el).find('.price').text().trim()
})).get();
}
(async () => {
try {
const html = await fetchPage('https://example.com/products');
console.log(parseItems(html));
} catch (e) {
console.error('Error:', e.message);
}
})();
async function scrapeAll(url) {
let next = url, results = [];
while (next) {
const html = await fetchPage(next);
results.push(...parseItems(html));
const $ = cheerio.load(html);
next = $('.next-page').attr('href') || null;
}
return results;
}
Timeouts: Increase the timeout or verify proxy connectivity.
Empty arrays: Check your CSS selectors against the site’s HTML.
Use when: Pages render content via JavaScript (SPAs, infinite scroll).
They execute JS exactly like real users, enabling data capture from client-side-rendered pages.
js
require('dotenv').config();
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: true,
args: [`--proxy-server=${process.env.GOPROXY_HOST}:${process.env.GOPROXY_PORT}`]
});
const page = await browser.newPage();
await page.authenticate({
username: process.env.GOPROXY_USER,
password: process.env.GOPROXY_PASS
});
await page.goto('https://example.com/dynamic', { waitUntil: 'networkidle2' });
let prevHeight;
do {
prevHeight = await page.evaluate('document.body.scrollHeight');
await page.evaluate('window.scrollTo(0, prevHeight)');
await page.waitForTimeout(1000);
} while ((await page.evaluate('document.body.scrollHeight')) > prevHeight);
const data = await page.evaluate(() =>
Array.from(document.querySelectorAll('.item')).map(el => ({
title: el.querySelector('.title')?.innerText,
price: el.querySelector('.price')?.innerText
}))
);
console.log(data);
await browser.close();
})();
js
require('dotenv').config();
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({
proxy: { server: `${process.env.GOPROXY_HOST}:${process.env.GOPROXY_PORT}`,
username: process.env.GOPROXY_USER,
password: process.env.GOPROXY_PASS }
});
const page = await browser.newPage();
await page.goto('https://example.com/dynamic', { waitUntil: 'networkidle' });
const items = await page.$$eval('.item', els =>
els.map(el => ({ title: el.querySelector('.title').innerText }))
);
console.log(items);
await browser.close();
})();
Proxy Errors: Verify .env credentials.
Missing Data: Use page.screenshot() to debug rendering.
Use when: Minimal code and infrastructure—ideal for large-scale or multi-site jobs.
js
require('dotenv').config();
const axios = require('axios');
async function submitJob(url, selectors) {
const res = await axios.post('https://api.goproxy.com/scraping/jobs',
{ url, selectors },
{ headers: { 'Authorization': `Bearer ${process.env.GOPROXY_API_KEY}` } }
);
return res.data.jobId;
}
async function fetchResults(jobId) {
const res = await axios.get(`https://api.goproxy.com/scraping/jobs/${jobId}`,
{ headers: { 'Authorization': `Bearer ${process.env.GOPROXY_API_KEY}` } }
);
return res.data;
}
(async () => {
const selectors = [{ name: 'title', path: '.item .title' }, { name: 'price', path: '.item .price' }];
const jobId = await submitJob('https://example.com/products', selectors);
console.log('Job ID:', jobId);
let result;
do {
await new Promise(r => setTimeout(r, 2000));
result = await fetchResults(jobId);
} while (result.status !== 'completed');
console.log(result.data);
})();
GoProxy handles proxy rotation, retries, and structured JSON output out of the box.
Rotating sessions: New IP per request for breadth-first crawls.
Sticky sessions: Same IP for multi-step interactions (login flows).
Geo-targeting: Use GoProxy’s dashboard to select countries or cities; verify via https://ipinfo.io/json.
Use exponential backoff (1s → 2s → 4s) on HTTP 429/503 errors. Insert random delays (2–5 s) between actions to mimic human behavior.
Switch between headless and headful modes. Integrate a CAPTCHA solver for high‑security sites.
Log proxy ID, latency, status codes. Use GoProxy webhooks for error alerts and automated fallback.
Method | Dynamic JS | Setup Complexity | Speed | Best Use Case |
Axios + Cheerio | No | Low | Fast | Static pages, bulk data extraction |
Puppeteer | Yes | Medium | Moderate | Interactive SPAs, infinite scroll |
Playwright | Yes | Medium | Moderate | Cross-browser scenarios |
GoProxy API | Yes | Very Low | High | Enterprise-scale, low-dev overhead |
Start with Axios + Cheerio for minimal setup and fast static‑HTML scraping.
A pool of 5–10 rotating residential IPs typically handles hundreds of pages per hour.
Rotation reduces CAPTCHA triggers but doesn’t guarantee avoidance. Add a CAPTCHA‑solving service for full coverage.
Rotating: New IP each request—ideal for breadth‑first data collection.
Sticky: One IP per session—necessary for login flows and checkout processes.
Call https://ipinfo.io/json through each proxy; inspect the country and city fields.
Web scraping is an invaluable skill for gathering data from the internet, but it’s not without its hurdles, especially when tackling dynamic, JavaScript-heavy websites or navigating anti-scraping protections. In this blog, we’ve walked through three powerful methods for web scraping with Node.js. Each approach has its strengths, and the best choice depends on your project’s needs.
But no matter the method, one thing remains constant: the need for reliable, undetectable proxies to bypass blocks, manage geo-restrictions, and keep your scraping running smoothly.
With over 90 million rotating IPs sourced from real residential devices, GoProxy delivers the anonymity and flexibility you need to scrape successfully. Whether you’re a beginner testing the waters or a pro scaling up your operations, our residential proxies integrate seamlessly into your Node.js workflows, offering both rotating and sticky sessions to suit your needs.
We’d love to invite you to experience GoProxy’s residential proxies and web scraping service. See firsthand how easy it is to set up, how reliable our IPs are, and how they can simplify even the toughest scraping challenges. Sign up today to get a free trial and 7*24 technical support!
< Previous
Next >