Images are one of the most common things people want to collect from a website. Sometimes the job is small, like downloading a few pictures from one page. Sometimes it is larger, like saving every product photo from a catalog, archiving a gallery, or building a dataset for research. This guide starts with the easiest approach first and then moves into more advanced methods, covering:
download images from a single page
get full-size images instead of thumbnails
handle lazy-loaded or JavaScript-rendered images
scrape images from an entire website
avoid duplicates, broken links, and blocked requests.
What Is Image Scraping?
Image scraping(also called image extraction/bulk image downloading) is the process of automatically finding and downloading image files (<img> tags, background images, SVGs, etc.) from one or more web pages.
A website may store image URLs in several places, including:
- standard <img src="..."> tags
- lazy-loaded attributes like data-src
- responsive image attributes like srcset
- CSS background images
- JavaScript-rendered galleries or infinite-scroll pages
That is why inspecting the HTML is sometimes enough, and sometimes not.
Common use cases include:
- building image datasets for machine learning or AI
- creating product image libraries for e-commerce
- archiving photo galleries or visual collections
- researching visual trends in fashion, design, or media
- backing up your own content or favorite galleries
Which Image Scraping Method Should You Use?
Not every scraping job needs the same tool.
If the page is simple and static, HTML parsing is usually enough. If the page loads content with JavaScript or only shows images after scrolling, browser automation is a better fit. If you need images across many pages, you also need crawling.

A simple rule:
One static page → no-code tool or Python with BeautifulSoup
Dynamic page with JavaScript → Playwright or another browser automation tool
Entire website → crawler + scraper
This decision comes first because it saves beginners from using the wrong method.
Before You Start: Legal & Ethical Considerations
Scraping publicly visible images is not the same as owning or redistributing them. Before you download anything, check the website’s rules and use common sense.
A practical checklist:
Read the site’s Terms of Service if the images matter commercially
Check robots.txt when you are unsure about crawling behavior
Avoid scraping content behind a login unless you have permission
Do not overload the server with rapid requests
Treat copyrighted images carefully, especially if you plan to republish, sell, or repost them
For personal research, testing, or internal use, scraping is often straightforward. For commercial use, it is smarter to confirm your rights first.
The Easiest Way: No-Code Image Downloading
If you only need a quick result, a no-code tool may be enough.
This works best when:
The page is simple
You only need a few images
You do not want to write code
You want to preview images before downloading
Top tool picks:
Fatkun Batch Download Image or Cat-Catch (Chrome) — Great for galleries.
DownThemAll! (Firefox) — Excellent for bulk links and images.
extract.pics — Still one of the best virtual-browser tools.
Bulk Image Downloader (Windows/Mac) — Point-and-click, filters by resolution.
Typical workflow:
1. Open the page in your browser.
2. Use an image-downloading extension or desktop tool.
3. Filter by file type or image size.
4. Download the images you want.
This approach is fast, but it can struggle with JavaScript-heavy pages, lazy-loaded images, hidden image URLs, or full-size files that are different from visible thumbnails.
If no-code fails, move to Python.
Scrape Images from a Static Page with Python BeautifulSoup
For a simple page, Python is the most flexible option. The usual tools are:
requests for fetching the page
BeautifulSoup for parsing HTML
pathlib for saving files
Install the tools
pip install requests beautifulsoup4
Step 1. Inspect the page
Before writing code, open the page in your browser and inspect an image element. Look for:
src
data-src
data-original
data-lazy-src
srcset
This matters because the visible image may be only a thumbnail, not the actual full-size file.
Step 2. Fetch the page HTML
Use a browser-like User-Agent header so the server is more likely to return the expected page.
Step 3. Extract image URLs
Do not check only src. Many sites place image URLs in lazy-loading attributes or responsive attributes.
Step 4. Convert relative URLs to full URLs
Websites often use paths like /images/photo.jpg. Those must be converted into full URLs before downloading.
Step 5. Download the images
Save the files into a folder, skip duplicates, and use consistent naming.
Starter script:
from pathlib import Path
from urllib.parse import urljoin, urlparse
import hashlib
import requests
from bs4 import BeautifulSoup
def make_filename(image_url: str) -> str:
parsed = urlparse(image_url)
original_name = Path(parsed.path).name
if original_name:
return original_name
# Fallback name when the URL does not contain a filename
return hashlib.sha1(image_url.encode("utf-8")).hexdigest() + ".jpg"
def extract_image_urls(page_url: str) -> list[str]:
headers = {
"User-Agent": "Mozilla/5.0"
}
response = requests.get(page_url, headers=headers, timeout=20)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
image_urls = set()
for img in soup.find_all("img"):
for attr in ("src", "data-src", "data-original", "data-lazy-src"):
value = img.get(attr)
if value:
image_urls.add(urljoin(page_url, value))
srcset = img.get("srcset")
if srcset:
candidates = []
for item in srcset.split(","):
item = item.strip()
if not item:
continue
url_part = item.split(" ")[0]
candidates.append(url_part)
if candidates:
# Usually the last candidate is the largest image
image_urls.add(urljoin(page_url, candidates[-1]))
return sorted(image_urls)
def download_image(image_url: str, folder: Path) -> None:
headers = {
"User-Agent": "Mozilla/5.0"
}
response = requests.get(image_url, headers=headers, stream=True, timeout=30)
response.raise_for_status()
filename = make_filename(image_url)
file_path = folder / filename
if file_path.exists():
return
with open(file_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
def scrape_images(page_url: str, output_folder: str = "downloaded_images") -> None:
folder = Path(output_folder)
folder.mkdir(parents=True, exist_ok=True)
image_urls = extract_image_urls(page_url)
print(f"Found {len(image_urls)} image URLs")
for image_url in image_urls:
try:
download_image(image_url, folder)
print(f"Saved: {image_url}")
except Exception as e:
print(f"Failed: {image_url} -> {e}")
if __name__ == "__main__":
scrape_images("https://example.com")
How to Get Full-Size Images Instead of Thumbnails(Most Common Frustration)
A page may show a small preview image while the actual image file is much larger. To find the full-size version:
srcset
data-src and similar attributes
the image opened in a new tab
the browser network panel
the page’s click actions or gallery viewer
Often, the full-size file is one of these:
the largest entry in srcset
a separate image URL loaded after clicking
a file from a JavaScript endpoint
the original upload, not the thumbnail shown in the layout
If your scraper only finds thumbnails, you are probably reading the display image, not the source image.
Scrape Lazy-Loaded/JavaScript Images with Python Playwright
Some pages look empty in raw HTML because the images are loaded later by JavaScript. This is common on modern websites, especially pages with:
infinite scroll
product grids
gallery popups
React, Vue, or similar front ends
Signs the page is dynamic
images appear only after scrolling
view source does not show the final image URL
your script finds no images, but the browser clearly shows them
In that case, use browser automation.
Install Playwright
pip install playwright requests
playwright install chromium
Basic workflow
1. Open the page in a browser automation tool.
2. Wait for the page to load.
3. Scroll to trigger lazy loading.
4. Read the rendered page.
5. Extract the final image URLs.
6. Download the files.
Here is a simple Playwright example:
from pathlib import Path
from urllib.parse import urljoin
import os
import time
import requests
from playwright.sync_api import sync_playwright
def scrape_dynamic_images(page_url: str, folder: str = "dynamic_images") -> None:
os.makedirs(folder, exist_ok=True)
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(page_url, wait_until="networkidle")
# Trigger lazy loading
page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(2)
images = page.query_selector_all("img")
count = 0
for img in images:
src = img.get_attribute("src") or img.get_attribute("data-src")
if not src or src.startswith("data:"):
continue
full_url = urljoin(page_url, src)
try:
response = requests.get(full_url, timeout=20)
response.raise_for_status()
filename = f"image_{count}.jpg"
file_path = os.path.join(folder, filename)
with open(file_path, "wb") as f:
f.write(response.content)
count += 1
print(f"Saved: {filename}")
except Exception as e:
print(f"Failed: {full_url} -> {e}")
browser.close()
print(f"Done: {count} images downloaded")
if __name__ == "__main__":
scrape_dynamic_images("https://example.com/dynamic-gallery")
This is a starter example. In real projects, you may need to scroll multiple times, wait longer, or click buttons to reveal more images
Scrape Images from an Entire Website
If your goal is to collect all images from a site, you need to crawl first. That is a different job from downloading images from one page.
A practical workflow:
1. Start with the homepage or sitemap.
2. Collect internal links.
3. Keep track of pages you have already visited.
4. Visit each page one by one.
5. Extract image URLs from each page.
6. Save the source page URL with the image URL.
7. Deduplicate the results.
For larger sites, keep your crawler organized:
use a queue of pages to visit
store visited URLs in a set
skip login pages, search pages, and filter pages
write output to CSV or JSON for cleanup later
This is slower than scraping one page, but it is the right approach when you need broad coverage.
Common Problems & How to Fix Them
1. 404 errors or broken links
Use urljoin() to build full URLs correctly. Some sites also move or remove files, so occasional broken links are normal.
2. Only thumbnails are downloaded
Check srcset, data-src, and the browser network panel. The visible image may not be the original file.
3. The site blocks you after many requests
Slow down. Add delays between requests and avoid aggressive crawling. Consider using rotating residential proxies to automatically give fresh IPs per request/session/time interval.
4. The dynamic page returns no images
Use browser automation instead of only parsing HTML.
5. Duplicate filenames
Use a filename strategy based on the URL, or add a hash to the file name.
6. Image formats are mixed
Do not assume everything is JPG. Keep the real file extension when possible. Websites may use PNG, WebP, SVG, or GIF.
Best Practices for Efficient & Responsible Scraping
A good scraper should do more than just download files. It should:
Handle relative and absolute URLs
Check src, data-src, and srcset
Skip duplicates
Save files in organized folders
Log everything
Include error handling
Respect delays between requests
Start with one page before scaling up
If you are scraping for a dataset, it also helps to save metadata(source page URL, alt text, timestamp) in a CSV. That makes your data much easier to reuse later.
FAQs
1. Is it hard to scrape images from a website?
Not always. Static pages are simple. Dynamic sites and whole-site scraping are harder because they need more logic.
2. Why do I only get thumbnails?
The website may use responsive images, lazy loading, or a separate full-size file. Check srcset, data-src, and the browser network panel.
3. Can I scrape images from an entire website?
Yes, but you need a crawler, not just a downloader. You must discover URLs first and then scrape each page.
4. What is the easiest way for beginners?
Start with a no-code tool for a single page. If that is not enough, move to Python for more control.
5. What if the website uses JavaScript?
Use browser automation so the page loads fully before you extract image URLs.
6. How to avoid getting blocked while scraping?
Add short delays and use realistic User-Agent headers. For larger jobs or entire-site crawling, an effective helper is rotating residential proxies. GoProxy’s unlimited traffic rotating residential proxies help you scrape at scale without worry of exceeding traffic limit/budget. Get a test chance today!
Final Thoughts
Scraping images from a website is easy only when the page is simple, but modern sites often require handling lazy loading, thumbnails vs full-size, dynamic rendering, and multiple pages.
Start with one static page → understand the HTML structure → download images with BeautifulSoup → move to Playwright for dynamic sites → add crawling when you need an entire website.
This keeps the process manageable and gives you a scraper you can actually use.