GoProxy > Blog > Use Cases > How to Scrape All Tweets from a User's X Account in 2025

How to Scrape All Tweets from a User's X Account in 2025

Post Time: 2025-11-25 Update Time: 2025-11-25

Step-by-step 2025 guide to download and archive a user’s tweets (text + media) using Official X API, no-code, CLI, or programmatic methods.

Want to download every tweet from an X (Twitter) account — text, timestamps, replies, and media — and actually finish the job without stopping at UI caps? This guide

Puts the Official X API first (recommended when feasible),
Provides practical fallback methods (no-code, CLI, Python scripts, managed providers),
Gives code example, anti-blocking tactics, troubleshooting.

Note: Numeric values in this guide (guest token expiry, doc_id rotation, rate-limit observations, proxy cost ranges) are community-observed as of Nov 25, 2025. Platform internals change frequently — treat these as reference, not guarantees.

Why Scrape All Tweets from a User's Account?

Scrape All Tweets from a specific User's X Account

It is often tied to:

Personal archiving: back up your own or a loved one’s posts.

Research & analysis: longitudinal studies or sentiment tracking.

Marketing & monitoring: competitor/influencer analysis including media.

Journalism: reconstruct timelines and evidence.

Note: UI caps (commonly observed near ~3,200 tweets) and platform anti-scraping measures mean you must plan for pagination, proxies, or the Official API.

What “All” means in practice

X serves timelines via GraphQL/XHR; some UI endpoints can cap visible tweets (~3k). Workarounds include using search endpoints (from:username), date-slicing, or accessing full-archive via the Official API (paid tiers) or managed providers.

Community-observed behaviors (Nov 25, 2025): guest tokens may expire ~2–4 hours; doc_ids can rotate every ~1–3 weeks; rate-limit behavior varies by endpoint — expect to implement defensive backoff and proxies for large jobs.

Legal & Ethical Considerations Before You Start

Official API is preferred — safe with respect to TOS (if you follow the API rules and quotas).

Public ≠ permitted. Scraping X tweets is legal for public data under precedents like hiQ Labs v. LinkedIn, but it breaches X's terms of service, risking account suspension and IP ban.

Prefer scraping your own account or data you have permission to archive. Avoid PII collection and only keep what your use case requires. Follow GDPR/CCPA retention and subject-access expectations.

If the data will be used in litigation, research with high-stakes decisions, or commercial products, consult legal counsel.

Methods Overview & Quick Pick

Method	Tools	Best for
Official X API (Method 0)	Official X API (v2)	Preferred for reliability, legal clarity, and full-archive on paid tiers
No-code (Method 1)	Lobstr, Octoparse, Apify actors	Non-technical quick archives and scheduled runs
Command-line (Method 2)	gallery-dl, wfdownloader, jDownloader	Fast media + metadata dumps
Python scripts (Method 3)	Twscrape, Twikit, custom scripts	Reproducible scripts, date-slicing, checkpointing
Managed/Production (Method 4)	ScrapFly, Apify	Production pipelines with low maintenance

Method 0. Official X API (Recommended When Feasible)

When to choose: you need reliability, legal clarity and/or full-archive access (paid tiers).

Why: the Official API is maintained by X, avoids the TOS gray area associated with scraping, and offers enterprise/full-archive features on paid plans.

Steps

1. Sign up at the X Developer Portal and create an app/project.

2. Choose a tier (Free / Basic / Pro / Enterprise) based on quota and price.

3. Obtain credentials (OAuth2 / Bearer token).

4. Use endpoints: user lookup (username → user_id), timelines/user tweets, search/full-archive (if included), and page via cursor tokens.

Simple curl example

# 1) Get user ID

curl -H "Authorization: Bearer YOUR_BEARER_TOKEN" \

"https://api.x.com/2/users/by/username/USERNAME"

# 2) Fetch tweets (paginated)

curl -H "Authorization: Bearer YOUR_BEARER_TOKEN" \

"https://api.x.com/2/users/USER_ID/tweets?max_results=100"

# Loop using meta.next_token until none

Note: If you need media, request media expansions (attachments.media_keys) and call media lookup endpoints per the docs. Official API usage is the safest production choice if the budget and quota allow.

Method 1. No-code (Beginner; Fastest to Start)

Best for: Non-technical users who want quick exports and scheduling.

Tools: Lobstr, Octoparse, Apify actors.

Steps (Lobstr example)

1. Create account and install any connector/Chrome helper if required.

2. Create a “Twitter/X User Tweets” task / actor. Add username(s) or upload CSV.

3. Leave Max Results blank for “all” or enter a cap; set concurrency low (1–3) as you test.

4. Choose output (CSV / Google Sheets / S3); run and monitor.

Micro-tips

Start concurrency at 1, then increase to 2–3 while watching error rates.

For very large archives, split by date ranges (e.g., 2010–2014, 2015–2019) to avoid caps.

Export to Google Sheets or S3 for downstream analysis and backups.

Troubleshooting

Job stops early: reduce concurrency and split job by date.

Many blank fields: enable “login/cookies” option in the tool or try the tool’s XHR/JSON mode.

Method 2. Command-line Tools (Low-code; Media-focused)

Best for: Terminal users who need fast media + metadata dumps.

Tool: gallery-dl

Install & run

pip install --user gallery-dl

# profile + metadata + media

gallery-dl "https://x.com/username" --write-metadata \

-o "directory=./output/username/{id}" --filename-template "{id}"

# search-based (surpasses timeline caps)

gallery-dl "https://x.com/search?q=from:username" --write-metadata \

-o "directory=./output/username/search/{id}" --filename-template "{id}"

Micro-tips

Use the search URL (search?q=from:username) to get tweets beyond UI timeline caps.

Always set --write-metadata so each media file gets associated JSON with tweet id/date/text.

Add retry logic or re-run failed ids; for big accounts, run in batches and checkpoint outputs.

Post-process example (JSON → CSV)

import json, glob, csv

files = glob.glob('output/username/**/*.json', recursive=True)

rows=[]

for f in files:

meta=json.load(open(f,encoding='utf-8'))

rows.append({'id':meta.get('id'),'date':meta.get('date'),'text':meta.get('text'),'media':';'.join(meta.get('media',[]))})

with open('username_tweets.csv','w',newline='',encoding='utf-8') as out:

writer=csv.DictWriter(out,fieldnames=rows[0].keys()); writer.writeheader(); writer.writerows(rows)

Troubleshooting

Gets only ~3k tweets: use search?q=from:username or split date ranges.

Missing media: run gallery-dl "https://x.com/username/media" or check metadata fields.

403/429: throttle, add short delays, or use residential proxies.

Method 3. Python Scripts (Intermediate)

Best for: Developers needing reproducibility and integration.

Steps

1. Use a maintained library (Twscrape, Twikit) or a stable SDK.

2. Save cookies, implement pagination and checkpointing, rotate proxies.

3. Slice large jobs via since: / until: queries and merge results.

Minimal example

# pseudocode — check library docs for exact API

from twscrape import API, gather

import asyncio, pandas as pd

async def main():

api = API()

await api.pool.add_account('user','pass','email','emailpass')

await api.pool.login_all()

tweets = await gather(api.user_tweets('target_id', limit=20000))

df = pd.DataFrame([{'id':t.id,'date':t.date,'text':t.text} for t in tweets])

df.to_csv('target_tweets.csv', index=False)

asyncio.run(main())

Micro-tips

Checkpoint every 100–500 tweets: write intermediate JSON/CSV so crashes resume easily.

Use since: / until: date filters to split large jobs into manageable slices.

Save and reuse cookies to reduce guest-token churn and login frequency.

Troubleshooting

Doc_id or guest token errors: consider managed SDKs or API to avoid constant maintenance.

High error rates: add exponential backoff: 1s → 2s → 4s → 8s, etc.

Method 4. Managed & Scalable (Advanced; Production)

Best for: Teams or production pipelines needing reliability.

Why: Managed SDKs handle token refresh, doc_id rotation, JS rendering and integrate proxies — reducing hands-on maintenance.

ScrapFly example

from scrapfly import ScrapflyClient, ScrapeConfig

client = ScrapflyClient(key='YOUR_KEY')

res = client.scrape(ScrapeConfig(url='https://x.com/username', asp=True, render_js=True))

# parse res.content and follow pagination

Micro-tips

Use asp=True / render_js=True (or equivalent) so the SDK auto-refreshes tokens/doc_ids.

Provision residential/mobile proxies for reliability; monitor IP reputation.

Store raw JSON snapshots for auditing and repro.

Troubleshooting

Unexpected breaks after X update: managed SDKs usually patch quickly — check changelog.

Best Practices for Success

Pagination: store next_token/cursor and loop until none.

Date-slicing: from:username since:YYYY-MM-DD until:YYYY-MM-DD and run slices sequentially.

Lists: enumerate members, use OR queries (from:user1 OR from:user2) when supported.

Media mapping: media/{username}/{tweet_id}/ and include local_media_paths in CSV.

Prefer residential/mobile proxies (more robust than datacenter IPs). Tip: For reliable residential proxies, consider a reputable proxy service, like GoProxy, for long-running X data collection.

Reuse session cookies; rotate user agents; add Accept-Language headers.

Use exponential backoff with jitter on 429/403 errors.

Monitor error rates and set alerts for spikes.

FAQs

Q: Why do some tools stop at ~3,200 tweets?

A: UI/unofficial API caps — use search endpoints (from:username), date ranges or managed SDKs to get beyond that.

Q: Can I scrape without proxies?

A: For very small, infrequent jobs possibly; for medium/large jobs you’ll almost certainly need residential/mobile proxies to avoid blocks.

Q: Is it legal to scrape tweets for research?

A: Public-data scraping can be legally defensible in many jurisdictions, but it may still violate platform TOS. Anonymize and consult counsel for sensitive uses.

Final Thoughts

Scraping all tweets is achievable but requires matching the right method to your skill, scale, and risk tolerance.

Need rotating proxies for Twitter (X) scraping? Sign up here and get your 500M free trial during this Black Friday Sales! Experience smooth data collection.

< Previous

How to Scrape Data from a Website: Beginners’ Step-by-Step Guide for 2025

Next >

Web Crawling vs Scraping: Beginner’s Practical Guide to Unlocking Web Data