Guide to TikTok Proxy 2026: Types, Setup, and Best Practices
Learn exactly what a TikTok proxy is, which type is best for beginners, step-by-step setup for desktop and mobile, testing tips, and best practices to keep accounts stable.
Apr 17, 2026
Learn web scraping with Cheerio in Node.js: complete step-by-step covering setup, static & dynamic scraping, pagination, auth, proxies, retries, testing, and scaling.
Web scraping lets you automatically collect data from websites without manual copying. When you need speed, near-zero memory overhead, and familiar jQuery-style syntax, Cheerio is still one of the fastest and most reliable tools for web scraping with Cheerio in Node.js in 2026.

This guide takes you from zero to a working scraper, then shows exactly how to handle modern dynamic sites, debug failures, clean data, and scale responsibly.
Why choose Cheerio
Key limitation: Cheerio only sees the raw HTML the server returns. Client-side JavaScript content will be missing.
Solution: Render first with Puppeteer/Playwright, then hand the HTML to Cheerio.
Quick decision checklist before coding
1. Press Ctrl+U (view source). If your target data is already in the HTML → Use Cheerio only.
2. Open DevTools → Network tab → filter XHR/Fetch. If you see clean JSON endpoints → Call the API directly (best option).
3. No API + content injected by JS → Render + Cheerio.
One-line test:
curl -s "https://quotes.toscrape.com" | grep -E 'class="quote"'
Diagnostic summary
Use Cheerio for server-rendered HTML. For heavy SPAs, prefer JSON APIs or render-then-parse.
mkdir cheerio-scraper && cd cheerio-scraper
npm init -y
npm install axios cheerio
# Optional modern alternatives
npm install undici puppeteer
Fetch tip (lower overhead than Axios):
const { request } = require('undici');
async function fetchHtml(url) {
const { body } = await request(url, { headers: { 'user-agent': '...' } });
return await body.text();
}
cheerio.load(html)
$(selector)
.text().trim() / .attr() / .find() / .each()
.html() — inspect raw output
Cleaning helpers:
const cleanText = s => String(s || '').replace(/\s+/g, ' ').trim();
const cleanPrice = s => parseFloat(String(s || '').replace(/[^\d.]/g, '')) || null;
1. Open page in Chrome → right-click → Inspect.
2. Copy selector → test with $(selector).html().
Selector toolkit
Attribute: $('div[data-id="product"]')
Has/contains: $('.card:has(.price)')
Fallback order: data-* → id → unique class
Pro tip: Always run $(selector).html() or $.html() when selectors break — it instantly shows whether content is server-rendered or missing.
// scraper-static.js
const axios = require('axios');
const cheerio = require('cheerio');
const fs = require('fs/promises');
async function fetchHtml(url) {
const res = await axios.get(url, {
headers: { 'User-Agent': 'Mozilla/5.0 (compatible; my-scraper/1.0)' },
timeout: 15000
});
return res.data;
}
function parseQuotes($) {
const quotes = [];
$('.quote').each((_, el) => {
const text = $(el).find('.text').text().trim();
const author = $(el).find('.author').text().trim();
const tags = $(el).find('.tags a').map((i, tag) => $(tag).text().trim()).get();
quotes.push({ text, author, tags });
});
return quotes;
}
async function main() {
const html = await fetchHtml('https://quotes.toscrape.com/');
const $ = cheerio.load(html);
const quotes = parseQuotes($);
await fs.writeFile('quotes.json', JSON.stringify(quotes, null, 2));
console.log(`Saved ${quotes.length} quotes`);
}
main().catch(console.error);
async function scrapePaged(startUrl) {
let url = startUrl;
const results = [];
while (url) {
const html = await fetchHtml(url);
const $ = cheerio.load(html);
results.push(...parseQuotes($));
const next = $('a.next').attr('href');
url = next ? new URL(next, url).toString() : null;
await waitRandom(600, 1400);
}
return results;
}
function waitRandom(min, max) {
return new Promise(r => setTimeout(r, Math.random() * (max - min) + min));
}
Inspect DevTools → Network → XHR while scrolling. Replicate JSON endpoints — faster and more stable.
Always prefer a JSON API found in DevTools.
If none is available, use a headless renderer and hand the resulting HTML to Cheerio:
// dynamic.js
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
async function scrapeDynamic(url) {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' }); // networkidle2 works well for most SPAs but test and tune if needed
const html = await page.content();
await browser.close();
const $ = cheerio.load(html);
return parseQuotes($); // reuse your parser!
}
Pro tips:
Cache rendered HTML snapshots (S3/Redis) to avoid repeated expensive rendering.
networkidle2 waits until there are at most 2 network connections for ~500 ms.
Two common approaches:
1. Headless login → cookie bridge (recommended): Login once in Puppeteer, export cookies, then reuse them in Axios/undici headers. Avoids re-rendering the login step every run.
2. Replicate login POST: Mimic the exact POST request with CSRF tokens (copy the sequence from DevTools).
Note: CAPTCHA, 2FA, and aggressive anti-bot systems may require manual intervention or specialized services. Always follow legal/ToS boundaries.
async function retry(fn, retries = 3, baseDelay = 500) {
for (let i = 0; i < retries; i++) {
try { return await fn(); }
catch (err) {
const status = err.response?.status;
if (status && status >= 400 && status < 500) throw err;
if (i === retries - 1) throw err;
await new Promise(r => setTimeout(r, baseDelay * Math.pow(2, i)));
}
}
}
Use when you see 429/403.
Advanced detection (TLS JA3, header order, fingerprinting) can still flag simple rotations. For high-volume work, consider a managed residential proxy service to handle rotation, geo-targeting, and fingerprint mitigation out of the box. For built-in proxy support without external services, consider Crawlee’s CheerioCrawler.
Use p-limit(5) to cap concurrent requests
Add randomized delays (500–1500 ms)
Golden rule: ≤ 1 request per second per domain unless you have explicit permission
Save raw HTML fixtures and test your parser:
// parse.test.js (Jest)
const fs = require('fs');
const cheerio = require('cheerio');
const { parseQuotes } = require('./scraper-static');
test('parse quotes page', () => {
const html = fs.readFileSync('__fixtures__/quotes.html', 'utf8');
const $ = cheerio.load(html);
const data = parseQuotes($);
expect(Array.isArray(data)).toBe(true);
expect(data.length).toBeGreaterThan(0);
expect(data[0]).toHaveProperty('text');
});
Add hourly/daily smoke tests on your main branch to detect schema changes early (e.g., result count drops to zero).
Recommendation: When you outgrow manual orchestration, switch to Crawlee (CheerioCrawler) — it handles concurrency, retries, proxies, and queueing out of the box.
1. Empty results — Usually caused by client-side rendering. Fix: check DevTools network for APIs or render then parse.
2. 403/429 — Use realistic headers, rotate residential proxies, add delays for better success rates.
3. Selectors break after a site update — Run $(selector).html() to inspect what's actually returned. Rebuild selectors using stable attributes.
4. Slow renders/costs — Cache rendered HTML snapshots and reduce headless browser runs.
Always check robots.txt and the site’s Terms of Service
Never overload servers
Store data in structured JSON first
Version your scrapers — websites change
Q: Can Cheerio execute JavaScript?
A: No — Cheerio parses static HTML strings. For JavaScript-rendered pages, render the DOM with a headless browser, then pass HTML to Cheerio.
Q: Is calling an API better than scraping HTML?
A: Yes — when available, APIs are faster and more stable. Prefer them when possible.
Q: How do I handle pagination?
A: Use next links for simple pagination; replicate XHR endpoints for infinite scroll.
Q: Is scraping legal?
A: It depends — check robots.txt, Terms of Service, and local laws. Avoid scraping personal or sensitive data without permission.
Start with the tiny static example above. If DevTools shows JSON endpoints — call them directly. If not, render with Puppeteer/Playwright, cache the snapshot, and let Cheerio handle the extraction. For production, combine retries, proxy rotation, concurrency limits, fixtures, and scheduled smoke tests — or use a crawler framework like Crawlee to avoid reinventing the orchestration layer.
< Previous
Next >
Cancel anytime
No credit card required