Discord Unblocked: Safe, Step-by-Step Guide for 2025
Practical, legal steps to troubleshoot and access Discord safely with DNS fixes, GoProxy proxy options, and VPN/IT guidance.
Sep 18, 2025
Complete Selenium web scraping guide with proxy setup, waits, anti-detection, scaling, and production best practices.
Selenium stands out in web scraping, especially for dynamic, JavaScript-heavy websites. This is a step-by-step instructions for Beginners → Experts: environment, a runnable starter scraper, proxy integration, anti-detection tactics, retries & checkpointing, scaling patterns, and production hardening.
Beginners: Follow Quick Start then the modular starter to get your first scraper running. Focus on basics like setup and simple extraction.
Intermediate / Pros: Jump to sections on proxies, rotation, stealth, scaling, and ops for advanced features like handling CAPTCHAs or deploying to the cloud.
Use Selenium when the target site requires real browser behavior (heavy JavaScript, user interaction, forms, infinite scroll, or content loaded after events). Unlike simpler libraries like BeautifulSoup or Requests, Selenium handles dynamic content where pages load data via AJAX or require clicks/forms.
When content is static or provided via an API, prefer requests + BeautifulSoup for speed and simplicity.
Concept | Description | Tips |
WebDriver | Programmatic controller for a real browser. | Use webdriver-manager for auto-syncing versions. |
Locators | By.ID, By.CSS_SELECTOR, By.XPATH. | Prefer stable CSS or well-targeted XPath; test in browser dev tools (F12). |
Waits | Waits Implicit: global, can cause subtle bugs. Explicit: WebDriverWait + expected_conditions — use this. |
Explicit waits prevent flakiness on slow loads. |
Headless | Faster, less resource-heavy — sometimes more detectable. | In 2025, combine with stealth libraries like SeleniumBase for better evasion. |
Proxy | Routes browser traffic; used for IP rotation, geo-targeting, and evasion. | Residential proxies are key for tough sites. |
Resource Blocking | Blocking images/fonts/CSS speeds runs but may break JS-heavy pages. | Test per site; start with images only. |
Respect robots.txt and site Terms (robots.txt is advisory but informative).
Do not scrape personal or protected data illegally—comply with 2025 laws like updated GDPR/CCPA by anonymizing data and obtaining consent where required.
Use secrets manager or CI secret variables for credentials — never commit .env.
Alerts on spikes in errors or abnormal behavior.
Add rate-limiting and exponential backoff to avoid accidentally overwhelming targets.
Let's start with the basics. We will use Python as the example(most popular for scraping).
1. Install Chrome or Chromium and confirm it runs.
2. Install Python 3.8+ (3.11 recommended). Verify: python --version
3. Create project folder and virtual environment:
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
4. Create requirements.txt (pinned versions in Appendix) and install:
pip install -r requirements.txt.
5. Create .env from .env.example in the project root and edit credentials if using proxies.
6. Run quick_start.py (next step) to verify environment.
Cross-platform env var notes
macOS / Linux:
export GOPROXY_USER="user"
export GOPROXY_PASS="pass"
Windows (PowerShell):
$env:GOPROXY_USER="user"
$env:GOPROXY_PASS="pass"
or edit .env and rely on python-dotenv for local development.
Save this as quick_start.py and run it. This proves Python + Selenium are installed and working.
# quick_start.py
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
def quick_start():
options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
options.add_argument("--window-size=1280,800")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
try:
driver.get("https://books.toscrape.com")
books = driver.find_elements(By.CSS_SELECTOR, "article.product_pod h3 a")
for b in books:
print(b.get_attribute("title"))
finally:
driver.quit()
if __name__ == "__main__":
quick_start()
Run:
python quick_start.py
Expected output: a list of book titles printed to console. If you see SessionNotCreatedException, update Chrome or let webdriver-manager handle it (it will download a compatible driver).
This is a single-file starter you can copy, customize, and run. It demonstrates:
Save as scraper.py.
# scraper.py
import os
import csv
import time
import random
import logging
import json
from dotenv import load_dotenv
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
from seleniumwire import webdriver # pip install selenium-wire
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
# --- Load environment
load_dotenv()
USE_PROXY = os.getenv("USE_PROXY", "false").lower() == "true"
GP_USER = os.getenv("GOPROXY_USER")
GP_PASS = os.getenv("GOPROXY_PASS")
GP_HOST = os.getenv("GOPROXY_HOST", "proxy.goproxy.com")
GP_PORT = os.getenv("GOPROXY_PORT", "8000")
# --- Logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger = logging.getLogger("scraper")
# --- Proxy health check (quick)
def test_proxy_httpbin(host, port, user=None, pw=None, timeout=8):
import requests
proxy = f"http://{user}:{pw}@{host}:{port}" if user else f"http://{host}:{port}"
proxies = {"http": proxy, "https": proxy}
try:
r = requests.get("https://httpbin.org/ip", proxies=proxies, timeout=timeout)
return True, r.json()
except Exception as e:
return False, str(e)
# --- Driver factory (selenium-wire) with debug toggle and resource blocking
from selenium.webdriver.chrome.options import Options as ChromeOptions
def make_driver(proxy=None, block_images=True, debug=False):
options = ChromeOptions()
if not debug:
options.add_argument("--headless=new")
options.add_argument("--window-size=1280,800")
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36")
if block_images:
prefs = {"profile.managed_default_content_settings.images": 2,
"profile.managed_default_content_settings.fonts": 2}
options.add_experimental_option("prefs", prefs)
seleniumwire_opts = None
if proxy:
auth = f"{proxy['user']}:{proxy['pass']}@{proxy['host']}:{proxy['port']}"
seleniumwire_opts = {
"proxy": {
"http": f"http://{auth}",
"https": f"https://{auth}",
"no_proxy": "localhost,127.0.0.1"
}
}
logger.info("Using proxy: %s:%s", proxy['host'], proxy['port'])
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),
options=options, seleniumwire_options=seleniumwire_opts)
# Basic interceptor: abort obvious static assets (disabled in debug mode)
def interceptor(request):
if not debug and request.path.endswith(('.png', '.jpg', '.jpeg', '.gif', '.woff2', '.woff')):
request.abort()
driver.request_interceptor = interceptor
return driver
# --- Robust pagination + extraction with tenacity retries
@retry(reraise=True, stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=8),
retry=retry_if_exception_type((TimeoutException, StaleElementReferenceException)))
def extract_titles(driver, start_url):
driver.get(start_url)
wait = WebDriverWait(driver, 15)
rows = []
while True:
# wait for product elements
items = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "article.product_pod h3 a")))
# extract
for it in items:
rows.append({"title": it.get_attribute("title")})
# attempt to find a next button; if absent, finish
try:
next_btn = driver.find_element(By.CSS_SELECTOR, "li.next a")
except Exception:
break
# record first item to detect page change
first_title = items[0].get_attribute("title") if items else None
next_btn.click()
# wait for either URL change or first item change; fallback to a small wait
try:
wait.until(lambda d: d.execute_script("return document.readyState") == "complete")
wait.until(lambda d: d.find_element(By.CSS_SELECTOR, "article.product_pod h3 a").get_attribute("title") != first_title)
except Exception:
time.sleep(random.uniform(1, 2))
continue
return rows
# --- Cookie utilities
def save_cookies(driver, path="cookies.json"):
with open(path, "w", encoding="utf-8") as f:
json.dump(driver.get_cookies(), f)
def load_cookies(driver, path="cookies.json"):
with open(path, "r", encoding="utf-8") as f:
cookies = json.load(f)
for c in cookies:
try:
driver.add_cookie(c)
except Exception:
pass
# --- CSV append helper (checkpointing)
def append_rows_csv(path, rows, fieldnames):
exists = os.path.exists(path)
with open(path, "a", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
if not exists:
writer.writeheader()
writer.writerows(rows)
# --- Proxy picker
def pick_proxy_from_env():
if not GP_USER or not GP_PASS:
return None
return {"host": GP_HOST, "port": GP_PORT, "user": GP_USER, "pass": GP_PASS}
def main():
url = "https://books.toscrape.com"
proxy = pick_proxy_from_env() if USE_PROXY else None
# Optional: quick proxy health test
if proxy:
ok, info = test_proxy_httpbin(proxy['host'], proxy['port'], proxy['user'], proxy['pass'])
logger.info("Proxy test ok=%s info=%s", ok, info)
if not ok:
logger.warning("Proxy health check failed. Proceeding anyway may cause driver errors.")
# Toggle debug=True to see browser and disable interceptor for easier debugging
debug_mode = False
driver = make_driver(proxy=proxy, block_images=True, debug=debug_mode)
start = time.time()
try:
rows = extract_titles(driver, url)
append_rows_csv("output.csv", rows, fieldnames=["title"])
logger.info("Saved %d rows in %.2fs", len(rows), time.time() - start)
except Exception:
logger.exception("Failed to scrape: %s", url)
finally:
driver.quit()
if __name__ == "__main__":
main()
1. cp .env.example .env and edit .env if using proxies.
2. Activate venv and pip install -r requirements.txt.
3. python scraper.py — writes output.csv incrementally as pages are scraped.
4. If something fails, set debug_mode = True near the call to make_driver(...) to see the browser and disable asset blocking.
Expected output.csv sample
title
"A Light in the Attic"
"Tipping the Velvet"
...
Why use proxies? Avoid IP blocks, geo-target content, scale across many IPs.
Proxy Types & Choices
Residential: harder to detect, higher cost and latency. Use for anti-bot sensitive targets (marketplaces, ticketing).
Datacenter: cheap and fast; easier to detect. Use for news, public listings.
Mobile: highest evasion, highest cost; rarely necessary—use for high-stakes social meida, rarely necessary.
Sticky vs Rotating
Sticky sessions: Same IP for an entire logical session (e.g., logins/carts).
Per-session rotation: Assign a fresh proxy per worker/task (recommended for most scrapers).
Choose rotation frequency based on sensitivity: start with per-session, and for high-sensitivity targets experiment with N requests per IP (N=1..10) and monitor blocks.
Setup Steps (GoProxy)
1. Sign up and choose a rotating proxy plan as need, get your credentials in the dashbord.
2. Store credentials in .env or a secrets store; never hardcode.
3. Health check: Use the test_proxy_httpbin helper above to verify connectivity before creating a driver.
4. For pools, maintain a small proxy pool: Randomize selection, log failures, and remove bad nodes.
Tips: For geo-targeting (e.g., regional prices), specify country in GoProxy dashboard.
Immediate check using curl
# small check (run in REPL or python file)
curl -x http://USER:[email protected]:8000 https://httpbin.org/ip
Implementation Note: selenium-wire accepts http://user:pass@host:port for proxy auth; this is the cleanest integration for authenticated proxies in the Python ecosystem.
These are pragmatic, ordered by simplicity → complexity. In 2025, consider integrating SeleniumBase for advanced stealth: pip install seleniumbase; from seleniumbase import Driver; driver = Driver(undetected=True).
1. User-Agent rotation: Change UA per session. Example:
options.add_argument("user-agent=Your User Agent string")
2. Language & timezone: --lang=en-US; or inject JS for timezone if needed.
3. Block heavy assets only after testing. Block images/fonts only if visible content still loads. Use the interceptor in scraper.py.
4. Human-like interactions: Add random small sleeps and scrolls before clicks:
import random, time
time.sleep(random.uniform(0.5, 1.8))
driver.execute_script("window.scrollBy(0, arguments[0]);", random.randint(100, 400))
5. Cookie reuse: Save cookies after a successful login and load them for subsequent sessions.
6. Avoid honeypots: Ignore hidden elements (display:none or zero size).
7. Captcha handling: If lawful and permitted, reduce rate, use residential proxies, route to human-in-the-loop or compliant solver services.
Decision Rule for Blocking
Use tenacity for transient network/DOM issues. Example already in extract_titles().
Write partial results to output.csv after each page/batch using append_rows_csv to avoid data loss.
timestamp, url, worker_id, proxy_id, attempt, status, error_message, duration.
TimeoutException: increase wait or validate selector.
StaleElementReferenceException: re-find element or retry.
Proxy fail: remove proxy from pool and retry job.
Small (single machine)
Sequential or small multi-threaded runs; rotate proxy per run.
Medium (workers)
Use a job queue (Celery, RQ) where each worker:
Large (Grid / container farm)
Selenium Grid or cloud-managed browser farm. Use ephemeral tasks, central logs/metrics, and k8s for orchestration.
Practical tips
Prefer ephemeral browser processes per job (spin up → scrape → quit) to avoid memory leaks.
Centralize logs (ELK/Fluentbit) and metrics (Prometheus/Grafana).
Monitor: success rate, proxy health, CPU/memory, CAPTCHA frequency.
Minimal Grid notes
Use official/maintained Selenium images; replace example tags with current stable versions.
1. Selector stability test: Run the scraper for 10–50 pages and measure failures.
2. Proxy health test: Sanity-check proxies and measure latency; remove slow ones.
3. Headless vs headed: Compare results — if headless is detected, use headed nodes or stealth options.
4. Rate limit test: Slowly ramp request rate to find safe throttle.
5. Resource test: Measure CPU/memory per browser instance; set worker concurrency accordingly.
6. CAPTCHA frequency: Log CAPTCHA encounters and reduce rate.
SessionNotCreatedException → Update Chrome/browser or use webdriver-manager for auto-sync.
TimeoutException → Increase WebDriverWait timeout (e.g., to 20s) or verify selector in dev tools.
StaleElementReferenceException → Re-find the element after page changes or wrap in retry.
Proxy auth fails → Check .env credentials; test with curl: curl -x http://user:pass@host:port https://httpbin.org/ip.
Captcha frequently → Slow down with random delays, switch to residential proxies via GoProxy, or add more human-like behaviors like random scrolls.
Debug tips
If something fails unexpectedly: set debug=True in make_driver(...), disable interceptor, and run headful (not headless) to visually inspect the page and selectors.
Selenium helps you scrape modern JS-heavy sites — but with complexity: cost, detection risk, and operational burden. This guide gives you a linear, runnable path from local proof-of-concept to hardened scraper: verify environment, run Quick Start, run Starter Scraper with optional GoProxy, add anti-detection measures, and scale. Keep everything modular — one function per page or job makes parallelism and debugging easier. Test thoroughly and instrument logs/metrics before you scale.
Next >