This browser does not support JavaScript

How to Effectively Scrape Booking.com: 2025 Step-by-Step Guide with Proxies

Post Time: 2025-10-15 Update Time: 2025-10-15

Scraping Booking.com data can provide individuals and businesses with valuable insights into hotel prices, availability, reviews, and more for analysis and decision-making. As one of the largest travel websites, it implements anti-scraping measures to protect its data, including dynamic content loading, CAPTCHA, and IP address blocking, and this requires some scraping techniques. We'll walk you through how to scrape data efficiently and ethically, incorporating proxy management to overcome these obstacles.

Important Note on Legality & Ethics! Scraping public data from Booking.com is generally permissible for personal, non-commercial use if it respects their Terms of Service and robots.txt file(check regularly for updates). Always scrape responsibly: avoid overloading servers, and do not collect personal or sensitive information. For commercial purposes, please consult a lawyer.

Why Scrape Booking.com?

booking official site

Booking.com offers publicly available data for various use cases, such as:

Price Monitoring & Comparison: Track fluctuations in hotel pricing to optimize competitive pricing in your travel or hotel app.

Market Research: Gather data on hotel availability in specific regions for business intelligence and decision-making.

Review Aggregation: Collect user feedback to improve recommendation systems and analyze customer sentiment.

Competitor Analysis: Compare listings across different cities or countries without manually browsing through hundreds of pages.

Common data fields include:

  • Hotel names, addresses, locations (latitude/longitude)
  • Room availability and pricing
  • User reviews and ratings
  • Hotel amenities (Wi-Fi, parking, pool, etc.)
  • Photos and descriptions.

Challenges & Solutions

Booking.com implements several strategies to prevent scraping, prepare for the following common challenges:

Challenge Description Solution
Dynamic Content (JS) Pages load via JavaScript, requiring rendering tools. Use Playwright/Selenium or GraphQL APIs. 
CAPTCHA & Bot Detection Human verification blocks automation. Rotate residential proxies and integrate solvers if needed.
IP Blocking Bans from excessive requests. Use rotating proxies with rate limiting.
Geo-Restrictions Content varies by location. Geo-targeted proxies for regional access.

Address these with rotating proxies (e.g., from services like GoProxy) and human-like behavior.

Step 1. Setting Up Your Environment

Before you begin scraping, start with the basics to ensure a smooth setup.

1. Download Python

Install Python 3.12+ from python.org.

2. Create a Virtual Environment

Open your terminal and run:

python -m venv scraper_env

Then activate it:

  • Windows: scraper_env\Scripts\activate
  • macOS/Linux: source scraper_env/bin/activate

3. Install Libraries

pip install requests beautifulsoup4 httpx selenium

# optional: playwright

pip install playwright

playwright install

Checkpoint: python --version returns 3.12+ and pip show requests works.

Step 2. Test a Simple Request

Before proceeding to more complex tasks, start without proxies to confirm you can fetch a page and see the expected HTML.

Example Code:

import requests

from bs4 import BeautifulSoup

 

url = "https://www.booking.com/searchresults.html?ss=Paris"

headers = {

  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "

                "(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",

  "Accept-Language": "en-US,en;q=0.9"

}

r = requests.get(url, headers=headers, timeout=15)

print(r.status_code)

soup = BeautifulSoup(r.text, "html.parser")

print(soup.title.string if soup.title else "No title — possibly blocked")

  • If 200 and a sensible title appear → continue.
  • If 403 or CAPTCHA shows → proceed to Step 3.

Step 3. Set Up Rotating Proxies(GoProxy)

To avoid being blocked, rotate IPs so each request looks like a distinct human visitor. Use geo-targeting rotating residential to improve anonymity and access local prices.

Tip: If blocked on Step 2, or before scaling to many requests.

1. Sign up

Create a GoProxy account and get credentials(7-day residential free trial for your worry-free purchase, test before scaling).

2. Integrate into your code

Simple requests example:

proxy = "http://username:[email protected]:port"

proxies = {"http": proxy, "https": proxy}

r = requests.get(url, headers=headers, proxies=proxies, timeout=15)

Async rotation(for scale):

import asyncio, httpx

from itertools import cycle

 

proxies = ["http://u:p@ip1:port", "http://u:p@ip2:port"]  # from GoProxy

proxy_cycle = cycle(proxies)

 

async def fetch(client, url):

    proxy = next(proxy_cycle)

    return await client.get(url, headers={"User-Agent":"..."}, proxies={"http:/": proxy, "https:/": proxy})

 

async def main(urls):

    async with httpx.AsyncClient(http2=True, timeout=20) as client:

        return await asyncio.gather(*(fetch(client, u) for u in urls))

  • Beginner: Start with 5-10 proxies, rotating every 5-10 requests.
  • Pro: Use 100+ proxies, per-request rotation, user-agent rotation, session cookie reuse per IP.

Checkpoint: Repeat Step 2 with proxies. If still blocked, try a different proxy or add more headers/session realism.

Step 4. Inspect Booking.com’s Structure

Use browser dev tools (F12) → Network → XHR/Fetch. Prefer GraphQL for stability. 

  • Search Results: https://www.booking.com/searchresults.html?ss=Paris&checkin=2025-11-01&checkout=2025-11-05&offset=0.
  • Details: E.g., https://www.booking.com/hotel/fr/tour-eiffel.html.
  • Reviews: https://www.booking.com/reviewlist.html?pagename=hotel/fr/tour-eiffel&type=total&sort=f_recent_desc&rows=25&offset=0.
  • Endpoints: POST to /dml/graphql (e.g., for AvailabilityCalendar).

Use robots.txt sitemaps for URL discovery.

Checkpoint: Identify one GraphQL request with JSON data.

Step 5. Hardening Anti-Scraping

Based on Step 3 for robust defenses:

Headers & session realism

Always set User-Agent, Accept-Language, Referer, Accept-Encoding. Reuse cookies for a short session per proxy to mimic a user session.

headers = {

  "User-Agent": "Mozilla/5.0 ...",

  "Accept-Language": "en-US,en;q=0.9",

  "Referer": "https://www.booking.com"

}

Rate limiting & concurrency

Start with 0.2–1 requests/second per IP. These are safe starting heuristics. Add random jitter: sleep(random.uniform(1, 3)) or token/bucket rate limiter. 

CAPTCHA handling

Prefer proxies that reduce CAPTCHA frequency (GoProxy can help). If CAPTCHA appears, options: rotate proxy, pause and retry with backoff, or integrate a solver.

Retries & exponential backoff

Use a safe_get helper to handle transient 429/503/403 patterns:

import time, random, requests

 

def safe_get(url, headers, proxies, max_retries=5):

    backoff = 1

    for attempt in range(max_retries):

        resp = requests.get(url, headers=headers, proxies=proxies, timeout=15)

        if resp.status_code == 200:

            return resp

        if resp.status_code in (403, 429, 503):

            time.sleep(backoff + random.uniform(0, backoff))

            backoff *= 2

            continue

        resp.raise_for_status()

raise Exception("Max retries exceeded").

Checkpoint: Run 10 test requests using your rate limit and proxies; success rate should be high (aim >95%).

Step 6. Scraping & Parsing Data

Scrape Booking using Python and Proxies

Beginner Path: Requests + BeautifulSoup.

Pro Path: Playwright for JS, GraphQL for efficiency.

Beginner path: Search Results

resp = requests.get("https://www.booking.com/searchresults.html?ss=Paris", headers=headers, proxies=proxies)

soup = BeautifulSoup(resp.text, "html.parser")

hotels = soup.select('[data-testid="property-card"]')

for h in hotels[:5]:

    name = h.select_one('[data-testid="title"]').get_text(strip=True) if h.select_one('[data-testid="title"]') else None

    price = h.select_one('[data-testid="price-and-discounted-price"]').get_text(strip=True) if h.select_one('[data-testid="price-and-discounted-price"]') else None

    link = h.select_one('a[href]')['href'] if h.select_one('a[href]') else None

    print(name, price, link)

Pagination example

base = "https://www.booking.com/searchresults.html"

for offset in range(0, 100, 25):

    resp = requests.get(base, params={"ss":"Paris","offset":offset}, headers=headers, proxies=proxies)

    # parse as above

Checkpoint: Extract 5 hotels and follow one detail page successfully.

Beginner path: hotel details (detail pages)

detail_url = "https://www.booking.com/hotel/us/example.html"

r = safe_get(detail_url, headers, proxies)

soup = BeautifulSoup(r.text, "html.parser")

 

title = soup.select_one('#hp_hotel_name').get_text(strip=True) if soup.select_one('#hp_hotel_name') else None

address = soup.select_one('.hp_address_subtitle').get_text(strip=True) if soup.select_one('.hp_address_subtitle') else None

amenities = [li.get_text(strip=True) for li in soup.select('[data-capla-component*=FacilitiesBlock] li')]

latlng = soup.select_one('[data-atlas-latlng]')['data-atlas-latlng'] if soup.select_one('[data-atlas-latlng]') else None

Checkpoint: Parse title, address, and at least one amenity for 2 sample hotels.

Pro Path: prices & availability(GraphQL)

Extract CSRF:

import re

import json

 

# After response

csrf_match = re.search(r"b_csrf_token: '([^']+)'", response.text)

csrf = csrf_match.group(1) if csrf_match else ""

 

payload = {

    "operationName": "AvailabilityCalendar",

    "variables": {

        "hotelId": "example",  # From URL

        "checkIn": "2025-11-01",

        "checkOut": "2025-11-05"

        # Add more from dev tools

    }

}

headers.update({"X-CSRF-Token": csrf})

api_resp = requests.post("https://www.booking.com/dml/graphql", json=payload, headers=headers, proxies=proxies)

data = api_resp.json()

# Parse: data['data']['availability']['avgPriceFormatted']

Checkpoint: Fetch and parse price for one hotel.

Fetching reviews

hotel_id = "hotel/us/example"

rev_url = f"https://www.booking.com/reviewlist.html?pagename=hotel/fr/tour-eiffel&type=total&sort=f_recent_desc&rows=25&offset=0"

soup = BeautifulSoup(safe_get(rev_url, headers, proxies).text, "html.parser")

reviews = soup.select('.c-review-block')

for rev in reviews:

    score = rev.select_one('.bui-review-score__badge').text.strip() if rev.select_one('.bui-review-score__badge') else "N/A"

text = rev.select_one('.c-review__body').text.strip() if rev.select_one('.c-review__body') else "N/A"

  • Paginate with offset.

Pro path: async + playwright for dynamic

For JS-heavy pages:

from playwright.async_api import async_playwright

 

async def scrape_dynamic(url):

    async with async_playwright() as p:

        browser = await p.chromium.launch(headless=True)

        page = await browser.new_page()

        await page.goto(url)

        content = await page.inner_html('body')

        await browser.close()

    return content  # Parse with BeautifulSoup

Checkpoint: Fetch 10 pages concurrently with <5% errors.

Step 7. Save, Clean & Process Data

import csv

 

# hotels from scraping

with open('hotels.csv', 'w', newline='', encoding='utf-8') as f:

    writer = csv.DictWriter(f, fieldnames=['name', 'price', 'reviews'])

    writer.writeheader()

    writer.writerows(hotels)

Clean: Handle N/A with if-else. Pro: Use Pandas for analysis/missing data.

Checkpoint: Create CSV with 50 rows for one city.

Best Practices, Maintenance & Scaling

Monitor: Track success rate, latency, 403s; use Prometheus for pros.

Canary Tests: Hourly selector validation.

Change Management: Store raw responses; update selectors weekly.

Defaults: 0.2 req/sec per IP; scale after stable runs.

Split Jobs: By date/city.

Tools: Scrapy/Celery for queues; ScrapeGraphAI for low-code alternatives.  

Final Thoughts

This guide equips you for ethical Booking.com scraping in 2025, with proxies and GraphQL for efficiency. Test incrementally, adapt selectors (inspect live site), and prioritize responsibility. For advanced, explore Playwright.

Look for reliable rotating residential proxies? Try GoProxy’s trial to test scraping booking.com. Sign up and get it today!

Next >

Rate Limited Errors: Causes, Fixes, and How to Resolve
Start Your 7-Day Free Trial Now!
GoProxy Cancel anytime
GoProxy No credit card required