Beginner’s Guide on Scrape Images from a Website
Learn how to scrape images from a website step by step, including no-code methods, Python, full-size image extraction, lazy-loaded pages, and whole-site scraping.
Apr 14, 2026
Learn Python caching with functools.lru_cache, cache, and cached_property, and practical strategies to speed up Python code efficiently.
Caching in Python is one of the simplest ways to speed up code that repeats expensive work. Instead of recalculating the same result every time, your program stores the result once and reuses it when the same input appears again. Python’s built-in functools tools make this especially practical with lru_cache, cache, and cached_property. This beginner-friendly guide explains what caching in Python is, why it helps, when to use it, and exactly how to implement it effectively.
Caching is the process of storing a computed result so it can be reused instead of recalculated.
If the same input appears again, the program returns the stored value immediately. This approach shines when a function is expensive (CPU-heavy, I/O-bound, or network-bound) and gets called repeatedly with identical arguments.

A closely related concept is memoization — caching function results based on their input values. Python provides two main decorators for this: lru_cache (with a size limit) and cache (unbounded, available since Python 3.9).
First call → do the work.
Second call with the same input → reuse the result.
The value comes from avoiding repeated CPU work, repeated network requests, repeated file reads, or repeated transformations that always produce the same output for the same input.
Recursive algorithms (Fibonacci, dynamic programming)
Repeated API or database queries (pair with IP rotation to avoid rate limits and IP blocks)
Reading large files or CSVs multiple times
Data preprocessing in machine learning
Computationally intensive calculations
Cache hit: Value is already stored → instant return.
Cache miss: Value is not stored → compute it and save it.
Memoization: Caching based on function arguments.
Invalidation: Removing or refreshing cached data when the underlying data changes.
Caching in Python typically uses a dictionary for lightning-fast lookups:
1. A function is called with specific arguments.
2. Python checks the cache using those arguments as the key.
3. Cache hit → return the stored value immediately.
4. Cache miss → compute the result, store it, and return it.
Important detail: Cache keys must be hashable. Lists and dictionaries are mutable and cannot be used directly as cache keys. Use immutable values such as integers, strings, or tuples instead. If your function accepts mutable inputs, convert them into a stable, hashable representation first.
Pro tip: lru_cache and cache automatically handle **kwargs and treat different keyword argument orders as separate cache entries — something a manual decorator usually fails to do.
Caching is worth using when:
The result is deterministic (same inputs → same output).
The function is expensive and the same inputs appear many times.
Skip caching when:
Freshness matters more than speed.
The function has side effects (e.g., writing to a file or database).
The output must be different every time (e.g., time(), random()).
Simple beginner rule:
Cache only when the function is expensive, the result is repeatable, and the same input is likely to be used again. If the data changes often, you need a clear invalidation strategy.
| Need | Best Option | Notes |
| Learn the basic idea | Manual dictionary cache | Great for understanding |
| Cache function results with size limit | functools.lru_cache | Beginner default |
| Cache function results without size limit | functools.cache (Python 3.9+) | Lighter and faster than lru_cache(maxsize=None) |
| Cache an expensive instance attribute | functools.cached_property | No-argument methods only |
| Keep cached data across restarts | Persistent storage (shelve, etc.) | Survives program restarts |
A good beginner default is this: Start with lru_cache. It provides simplicity, safety, and memory control in most cases.
When a cache reaches its limit, it must decide which entries to remove.
| Strategy | Description | Best For |
| LRU (Least Recently Used) | Removes least recently accessed item | General-purpose (default) |
| LFU (Least Frequently Used) | Removes least-used items | Uneven access patterns |
| FIFO | Removes oldest items first | Queue-like workloads |
| TTL (Time-To-Live) | Expires items after a set time | Time-sensitive data |
Beginner tip: Start with LRU — it fits almost every use case.
A manual cache is useful because it shows the basic idea clearly. It is not usually the final solution, but it is a good way to understand what caching does.
from functools import wraps
def memoize(func):
cache = {}
@wraps(func)
def wrapper(*args):
if args in cache:
return cache[args]
result = func(*args)
cache[args] = result
return result
return wrapper
@memoize
def add(a, b):
return a + b
print(add(2, 3)) # computes
print(add(2, 3)) # returns cached value
Limitation: This version only handles positional arguments. Python’s built-in decorators are far more powerful and handle keyword arguments, inspection tools, and invalidation.
For most real code, lru_cache is the default tool most beginners (and professionals) should reach for first.
from functools import lru_cache
@lru_cache(maxsize=128)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
print(fibonacci(35)) # 9227465
print(fibonacci.cache_info()) # CacheInfo(hits=33, misses=36, maxsize=128, currsize=36)
Key features
Avoids repeated work
Limits memory with maxsize
Perfect for recursive functions
Thread-safe
Provides cache_info() for easy tuning
See the speed difference instantly:
import time
start = time.perf_counter()
print(fibonacci(35))
print("Time taken:", time.perf_counter() - start)
Clear the Cache
fibonacci.cache_clear()
Extra option: Use @lru_cache(typed=True) if your function should treat 5 and 5.0 as different inputs.
Use this when you want the simplest possible memoization and know the number of unique inputs will stay manageable.
from functools import cache
@cache
def square(n):
return n * n
print(square(12))
print(square(12))
Perfect when the expensive work belongs to an object property and should be computed only once per instance.
from functools import cached_property
class DataLoader:
def __init__(self, filename):
self.filename = filename
@cached_property
def data(self):
print("Loading data once...")
return "Large dataset"
loader = DataLoader("data.csv")
print(loader.data) # computes
print(loader.data) # returns cached value
Note: Works only on methods with no arguments. Since Python 3.12, the internal lock has been removed.
In-memory caches disappear when the process ends. For long-running workflows or repeated scripts, use persistent storage.
import shelve
def get_data(key):
with shelve.open("cache.db") as db:
if key in db:
return db[key]
result = f"Expensive result for {key}"
db[key] = result
return result
Never assume caching helps — always measure. Use cache_info() (hits vs. misses) together with simple timing:
import time
from functools import lru_cache
@lru_cache(maxsize=128)
def slow_square(n):
time.sleep(0.1)
return n * n
start = time.perf_counter()
slow_square(10)
slow_square(10)
print("Time taken:", time.perf_counter() - start)
print(slow_square.cache_info())
A useful cache should show more hits over time, fewer repeated expensive calls, and a visible reduction in runtime.
Follow these to use caching effectively:
Profile first, optimize second.
Prefer lru_cache for most function caching.
Use a reasonable maxsize.
Cache only expensive, deterministic work.
Make sure arguments are hashable.
Clear the cache when the data changes (cache_clear() or del obj.attr).
Avoid caching side-effect-heavy functions.
Document cache behavior for future maintenance.
Use typed=True when type matters (e.g., int vs. float).
1. Caching everything
Not every function benefits from caching. Cheap functions may become slower because the cache itself adds overhead.
2. Ignoring stale data
A cached value can become wrong if the source data changes and the cache is not refreshed.
3. Using mutable objects as keys
Lists and dictionaries cannot be used directly as cache keys for lru_cache because the cache relies on hashable arguments.
4. Using unlimited caches carelessly
An unbounded cache can grow without limit and consume too much memory. Python’s docs explicitly note that cache is unbounded and lru_cache(maxsize=None) disables eviction.
5. Forgetting to clear caches
If the underlying data changes, the cache may need to be cleared manually with cache_clear() or by deleting a cached_property attribute.
Why must cache arguments be hashable?
Python stores cache entries using a dictionary-like lookup, and dictionary keys must be hashable.
Does caching always improve performance?
No. For small or cheap computations, caching overhead can outweigh the benefit.
How do I know whether caching is working?
Use cache_info() and compare benchmark results before and after caching. timeit is the standard tool for measuring small code snippets.
Which caching tool should beginners use first?
Start with functools.lru_cache. It gives a strong mix of simplicity, safety, and memory control.
Is cached_property the same as lru_cache?
No. cached_property stores a value on an instance attribute, while lru_cache memoizes function calls. Python’s FAQ describes them as the two principal tools for caching method calls.
As applications scale, you may need more advanced features:
| Need | Stdlib Tool | Next Popular Step |
| TTL / time expiry | Manual + timestamps | cachetools.TTLCache |
| Disk / restart-safe | shelve | joblib or diskcache |
| Distributed / web apps | — | Redis + redis-py |
| ML / large data | — | joblib.Memory |
| High-volume web scraping / APIs | — | Rotating proxies + caching |
Python caching is one of the highest-ROI optimizations you can make — and it’s surprisingly beginner-friendly.
Start with the slowest pure function in your codebase, add @lru_cache (or the right tool), run your benchmark, and watch the speedup. One decorator can turn minutes into milliseconds.
Implement it today, measure the results, and enjoy faster code with almost zero effort.
Next >
Cancel anytime
No credit card required