This browser does not support JavaScript

Beginner’s Guide on Scrape Images from a Website

Post Time: 2026-04-14 Update Time: 2026-04-14

Images are one of the most common things people want to collect from a website. Sometimes the job is small, like downloading a few pictures from one page. Sometimes it is larger, like saving every product photo from a catalog, archiving a gallery, or building a dataset for research. This guide starts with the easiest approach first and then moves into more advanced methods, covering:

download images from a single page

get full-size images instead of thumbnails

handle lazy-loaded or JavaScript-rendered images

scrape images from an entire website

avoid duplicates, broken links, and blocked requests.

What Is Image Scraping?

Image scraping(also called image extraction/bulk image downloading) is the process of automatically finding and downloading image files (<img> tags, background images, SVGs, etc.) from one or more web pages.

A website may store image URLs in several places, including:

  • standard <img src="..."> tags
  • lazy-loaded attributes like data-src
  • responsive image attributes like srcset
  • CSS background images
  • JavaScript-rendered galleries or infinite-scroll pages

That is why inspecting the HTML is sometimes enough, and sometimes not.

Common use cases include:

  • building image datasets for machine learning or AI
  • creating product image libraries for e-commerce
  • archiving photo galleries or visual collections
  • researching visual trends in fashion, design, or media
  • backing up your own content or favorite galleries

Which Image Scraping Method Should You Use?

Not every scraping job needs the same tool.

If the page is simple and static, HTML parsing is usually enough. If the page loads content with JavaScript or only shows images after scrolling, browser automation is a better fit. If you need images across many pages, you also need crawling.

scrape images from website

A simple rule:

One static page → no-code tool or Python with BeautifulSoup

Dynamic page with JavaScript → Playwright or another browser automation tool

Entire website → crawler + scraper

This decision comes first because it saves beginners from using the wrong method.

Before You Start: Legal & Ethical Considerations

Scraping publicly visible images is not the same as owning or redistributing them. Before you download anything, check the website’s rules and use common sense.

A practical checklist:

Read the site’s Terms of Service if the images matter commercially  

Check robots.txt when you are unsure about crawling behavior  

Avoid scraping content behind a login unless you have permission  

Do not overload the server with rapid requests  

Treat copyrighted images carefully, especially if you plan to republish, sell, or repost them

For personal research, testing, or internal use, scraping is often straightforward. For commercial use, it is smarter to confirm your rights first.

The Easiest Way: No-Code Image Downloading

If you only need a quick result, a no-code tool may be enough.

This works best when:

The page is simple  

You only need a few images  

You do not want to write code  

You want to preview images before downloading

Top tool picks:

Fatkun Batch Download Image or Cat-Catch (Chrome) — Great for galleries.  

DownThemAll! (Firefox) — Excellent for bulk links and images.

extract.pics — Still one of the best virtual-browser tools.

Bulk Image Downloader (Windows/Mac) — Point-and-click, filters by resolution.

Typical workflow:

1. Open the page in your browser.

2. Use an image-downloading extension or desktop tool.

3. Filter by file type or image size.

4. Download the images you want.

This approach is fast, but it can struggle with JavaScript-heavy pages, lazy-loaded images, hidden image URLs, or full-size files that are different from visible thumbnails.

If no-code fails, move to Python.

Scrape Images from a Static Page with Python BeautifulSoup

For a simple page, Python is the most flexible option. The usual tools are:

requests for fetching the page

BeautifulSoup for parsing HTML

pathlib for saving files

Install the tools

pip install requests beautifulsoup4

Step 1. Inspect the page

Before writing code, open the page in your browser and inspect an image element. Look for:

src

data-src

data-original

data-lazy-src

srcset

This matters because the visible image may be only a thumbnail, not the actual full-size file.

Step 2. Fetch the page HTML

Use a browser-like User-Agent header so the server is more likely to return the expected page.

Step 3. Extract image URLs

Do not check only src. Many sites place image URLs in lazy-loading attributes or responsive attributes.

Step 4. Convert relative URLs to full URLs

Websites often use paths like /images/photo.jpg. Those must be converted into full URLs before downloading.

Step 5. Download the images

Save the files into a folder, skip duplicates, and use consistent naming.

Starter script:

from pathlib import Path

from urllib.parse import urljoin, urlparse

import hashlib

 

import requests

from bs4 import BeautifulSoup

 

 

def make_filename(image_url: str) -> str:

    parsed = urlparse(image_url)

    original_name = Path(parsed.path).name

    if original_name:

        return original_name

 

    # Fallback name when the URL does not contain a filename

    return hashlib.sha1(image_url.encode("utf-8")).hexdigest() + ".jpg"

 

 

def extract_image_urls(page_url: str) -> list[str]:

    headers = {

        "User-Agent": "Mozilla/5.0"

    }

    response = requests.get(page_url, headers=headers, timeout=20)

    response.raise_for_status()

 

    soup = BeautifulSoup(response.text, "html.parser")

    image_urls = set()

 

    for img in soup.find_all("img"):

        for attr in ("src", "data-src", "data-original", "data-lazy-src"):

            value = img.get(attr)

            if value:

                image_urls.add(urljoin(page_url, value))

 

        srcset = img.get("srcset")

        if srcset:

            candidates = []

            for item in srcset.split(","):

                item = item.strip()

                if not item:

                    continue

                url_part = item.split(" ")[0]

                candidates.append(url_part)

 

            if candidates:

                # Usually the last candidate is the largest image

                image_urls.add(urljoin(page_url, candidates[-1]))

 

    return sorted(image_urls)

 

 

def download_image(image_url: str, folder: Path) -> None:

    headers = {

        "User-Agent": "Mozilla/5.0"

    }

    response = requests.get(image_url, headers=headers, stream=True, timeout=30)

    response.raise_for_status()

 

    filename = make_filename(image_url)

    file_path = folder / filename

 

    if file_path.exists():

        return

 

    with open(file_path, "wb") as f:

        for chunk in response.iter_content(chunk_size=8192):

            if chunk:

                f.write(chunk)

 

 

def scrape_images(page_url: str, output_folder: str = "downloaded_images") -> None:

    folder = Path(output_folder)

    folder.mkdir(parents=True, exist_ok=True)

 

    image_urls = extract_image_urls(page_url)

    print(f"Found {len(image_urls)} image URLs")

 

    for image_url in image_urls:

        try:

            download_image(image_url, folder)

            print(f"Saved: {image_url}")

        except Exception as e:

            print(f"Failed: {image_url} -> {e}")

 

 

if __name__ == "__main__":

    scrape_images("https://example.com")

How to Get Full-Size Images Instead of Thumbnails(Most Common Frustration)

A page may show a small preview image while the actual image file is much larger. To find the full-size version:

srcset

data-src and similar attributes

the image opened in a new tab

the browser network panel

the page’s click actions or gallery viewer

Often, the full-size file is one of these:

the largest entry in srcset

a separate image URL loaded after clicking

a file from a JavaScript endpoint

the original upload, not the thumbnail shown in the layout

If your scraper only finds thumbnails, you are probably reading the display image, not the source image.

Scrape Lazy-Loaded/JavaScript Images with Python Playwright

Some pages look empty in raw HTML because the images are loaded later by JavaScript. This is common on modern websites, especially pages with:

infinite scroll

product grids

gallery popups

React, Vue, or similar front ends

Signs the page is dynamic

images appear only after scrolling

view source does not show the final image URL

your script finds no images, but the browser clearly shows them

In that case, use browser automation.

Install Playwright

pip install playwright requests

playwright install chromium

Basic workflow

1. Open the page in a browser automation tool.

2. Wait for the page to load.

3. Scroll to trigger lazy loading.

4. Read the rendered page.

5. Extract the final image URLs.

6. Download the files.

Here is a simple Playwright example:

from pathlib import Path

from urllib.parse import urljoin

import os

import time

 

import requests

from playwright.sync_api import sync_playwright

 

 

def scrape_dynamic_images(page_url: str, folder: str = "dynamic_images") -> None:

    os.makedirs(folder, exist_ok=True)

 

    with sync_playwright() as p:

        browser = p.chromium.launch(headless=True)

        page = browser.new_page()

        page.goto(page_url, wait_until="networkidle")

 

        # Trigger lazy loading

        page.evaluate("window.scrollTo(0, document.body.scrollHeight)")

        time.sleep(2)

 

        images = page.query_selector_all("img")

        count = 0

 

        for img in images:

            src = img.get_attribute("src") or img.get_attribute("data-src")

            if not src or src.startswith("data:"):

                continue

 

            full_url = urljoin(page_url, src)

 

            try:

                response = requests.get(full_url, timeout=20)

                response.raise_for_status()

 

                filename = f"image_{count}.jpg"

                file_path = os.path.join(folder, filename)

 

                with open(file_path, "wb") as f:

                    f.write(response.content)

 

                count += 1

                print(f"Saved: {filename}")

            except Exception as e:

                print(f"Failed: {full_url} -> {e}")

 

        browser.close()

 

    print(f"Done: {count} images downloaded")

 

 

if __name__ == "__main__":

    scrape_dynamic_images("https://example.com/dynamic-gallery")

This is a starter example. In real projects, you may need to scroll multiple times, wait longer, or click buttons to reveal more images

Scrape Images from an Entire Website

If your goal is to collect all images from a site, you need to crawl first. That is a different job from downloading images from one page.

A practical workflow:

1. Start with the homepage or sitemap.

2. Collect internal links.

3. Keep track of pages you have already visited.

4. Visit each page one by one.

5. Extract image URLs from each page.

6. Save the source page URL with the image URL.

7. Deduplicate the results.

For larger sites, keep your crawler organized:

use a queue of pages to visit

store visited URLs in a set

skip login pages, search pages, and filter pages

write output to CSV or JSON for cleanup later

This is slower than scraping one page, but it is the right approach when you need broad coverage.

Common Problems & How to Fix Them

1. 404 errors or broken links

Use urljoin() to build full URLs correctly. Some sites also move or remove files, so occasional broken links are normal.

2. Only thumbnails are downloaded

Check srcset, data-src, and the browser network panel. The visible image may not be the original file.

3. The site blocks you after many requests

Slow down. Add delays between requests and avoid aggressive crawling. Consider using rotating residential proxies to automatically give fresh IPs per request/session/time interval.

4. The dynamic page returns no images

Use browser automation instead of only parsing HTML.

5. Duplicate filenames

Use a filename strategy based on the URL, or add a hash to the file name.

6. Image formats are mixed

Do not assume everything is JPG. Keep the real file extension when possible. Websites may use PNG, WebP, SVG, or GIF.

Best Practices for Efficient & Responsible Scraping

A good scraper should do more than just download files. It should:

Handle relative and absolute URLs  

Check src, data-src, and srcset  

Skip duplicates  

Save files in organized folders  

Log everything  

Include error handling  

Respect delays between requests  

Start with one page before scaling up

If you are scraping for a dataset, it also helps to save metadata(source page URL, alt text, timestamp) in a CSV. That makes your data much easier to reuse later.

FAQs

1. Is it hard to scrape images from a website?

Not always. Static pages are simple. Dynamic sites and whole-site scraping are harder because they need more logic.

2. Why do I only get thumbnails?

The website may use responsive images, lazy loading, or a separate full-size file. Check srcset, data-src, and the browser network panel.

3. Can I scrape images from an entire website?

Yes, but you need a crawler, not just a downloader. You must discover URLs first and then scrape each page.

4. What is the easiest way for beginners?

Start with a no-code tool for a single page. If that is not enough, move to Python for more control.

5. What if the website uses JavaScript?

Use browser automation so the page loads fully before you extract image URLs.

6. How to avoid getting blocked while scraping?

Add short delays and use realistic User-Agent headers. For larger jobs or entire-site crawling, an effective helper is rotating residential proxies. GoProxy’s unlimited traffic rotating residential proxies help you scrape at scale without worry of exceeding traffic limit/budget. Get a test chance today!

Final Thoughts

Scraping images from a website is easy only when the page is simple, but modern sites often require handling lazy loading, thumbnails vs full-size, dynamic rendering, and multiple pages.

Start with one static page → understand the HTML structure → download images with BeautifulSoup → move to Playwright for dynamic sites → add crawling when you need an entire website.

This keeps the process manageable and gives you a scraper you can actually use.

Next >

ERR_FAILED Error: Causes & Step-by-Step Fixes
Start Your 7-Day Free Trial Now!
GoProxy Cancel anytime
GoProxy No credit card required